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Abstract 



One of the most active areas of physics in the last decades has been that of critical phenomena, and Monte Carlo 
simulations have played an important role as a guide for the validation and prediction of system properties close 
to the critical points. The kind of phase transitions occurring for the Betts lattice (lattice constructed removing 
1/7 of the sites from the triangular- lattice) have been studied before with the Potts model for the values 
q = 3, fenomagnetic and antifen^omagnetic regime. Here, we add up to this research line the feiTomagnetic 
case for ^7 = 4 and 5. In the first case, the critical exponents are estimated for the second order transition, 
whereas for the latter case the histogram method is applied for the occurring first order transition. Additionally, 
Domany's Monte Carlo based clustering technique mainly used to group genes similar in their expression levels 
is reviewed. Finally, a control theory tool -an adaptive observer- is applied to estimate the exponent parameter 
involved in the well-known Gompertz curve. By treating all these subjects our aim is to stress the importance 
of cooperation between distinct discipUnes in addressing the complex problems arising in biology. 
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Introduction 



"Minerals grow, plants grow and live, 
animals grow, live and have feeling." 
Linnaeus, "Systema Naturae", 1735 

Monte Carlo simulations have been used for many years to study the properties of physical models, and have 
also played a significant role in statistics, biology, computer science and other fields, demonstrating its versality 
and powerful approach. Furthermore, many advances in computation algorithms and computer technology 
have made possible to study systems which would be impossible to examine only a few years ago. The 
first part of this thesis aims to give a brief explanation of the Monte Carlo method, a review of the principal 
algorithms used, the study of phase transitions, finite size scaling theory and finally, some results obtained with 
the Potts model for a recently proposed lattice named Betts or Maple Leaf lattice. 

Since the discovery of the helical structure of DNA and various complete genome sequences, biology has seen 
also an enormous advance. However, it seems that the only way to solve the complex problems raised in the 
study of biological systems is to share the challenge with other scientific disciplines such as chemistry, physics, 
and computer science. Research on cancer is one of the most important and interesting subjects in Biology. 
This terrible disease has received tremendous attention in the last part of the XX century, because of the huge 
amount of cases and the technological advances in analysis and medical treatment of tumours. Despite the 
efforts of the international scientific community, there are many unanswered questions related to the evolution 
of the cancer diseases, the causes that trigger them, the prediction of drugs and treatments effects, and the 
development of an effective cure. The introduction of the Monte Carlo method into biological problems has 
brought interesting results including the modeling of the structure and evolution of a epidermis cell nuclei, 
reproducing cancer growth. 

The second chapter reviews the clustering techniques commonly used to group genes with similar behaviour in 
their expressions across various experiments, which helps in the construction of genetic networks and targeting 
of genes involved in diseases like cancer. The superparamagnetic gene clustering algorithm is also explained 
as an example of a clustering technique that employs the Monte Carlo method and is based on a physical 
phenomenom, leaving the subject to future implementation. 

On the other hand, mathematical procedures, in particular models based on differential equations whose terms 
can represent not only the growth rate of a tumour, but also the growth or inhibition rates of substances existing 
in the medium or cell-cell interactions, provide an excellent tool to describe biological processes. There also 
exist empirical models that have proved to be very useful in fitting the experimental growth curves of tumours. 
The Gompertz model is a famous one, although there is not a convincing explanation of why it works so well. 
The Gompertz growth law has been introduced by Benjamin Gompertz in 1825 in his demographical studies, 
and in mathematical terms is written: 

A(a)=V^^ (1) 

where A (a) is the mortality rate. 

The main problem is that the biological interpretation of its characteristic parameters is not very well settled. 
A link of these parameters with the biological phenomenology, if found, would make the Gompertz model 
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extremely valuable as a predictive tool. The third part of this thesis discusses some of the most important 
models based on differential equations and gives a more complete idea about the formulation and applications 
of the Gompertz model, and finally presents a method based on control theory capable of accurately predict 
the first stages of Gompertz growth. 

The main purpose of this work is to emphasize the importance of an interdisciplinary research. Nowadays, it is 
clear- that many problems inherent to the biology field need to be adressed with tools coming from areas such 
as computational physics and apphed mathematics. 
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1.1 Brief History of the Monte Carlo Method 



The first electronic computer, ENIAC, was developed during the World War II period by a group of scientists 
working at the University of Pennsylvania in Philadelphia. They had realized that if electronic circuits could be 
made to count, then they could do arithmetic and hence, solve difference equations at incredible speeds. This 
would lead to a scientific revolution because it would give the possibility to study problems unsolved before 
due to the large amount of calculations needed. 

In 1946, Stanislaw Ulam, a mathematician working in Los Alamos, attended a conference about a preliminary 
computational model of a thermonuclear reaction probed in ENIAC as a test for the computer. Like other 
scientists, he was impressed by the speed and versatility of the ENIAC. Additionally, Ulam's extensive 
mathematical background made him aware that statistical sampling techniques that had fallen into disuse 
because of tediousness of calculations, could be resuscitated with ENIAC. The basis of the Monte Carlo method 
has been proposed later by him as a consequence of his interest in random processes. As Stan Ulam mentioned 
in 1983, his first thoughts and attempts to practice the Monte Carlo method were suggested by a question that 
occurred to him in 1946 as he was playing solitaires. The question was what were the chances that a Canfield 
solitaire laid out with 52 cards will come out succcessfuUy? ^ After spending a lot of time trying to estimate 
them by pure combinatorial calculations, he wondered whether a more practical method might not be to lay 
it out say one hundred times and simply observe and count the number of successful plays. He immediately 
thought about how to change processes described by certain differential equations into an equivalent form 
interpretable as a succession of random operations Ulam discussed his ideas with John von Neumann, 
Professor of Mathematics at the Institute for Advanced Study at Princeton, who was also a consultant to Los 
Alamos and one of the principals participating in the ENIAC probe conference in 1946. Von Neumann saw 
the importance of Ulam's approach and thought that it seemed especially suitable for exploring the behaviour 
of neutron chain reactions in fission devices. In March 1947, von Neumann wrote to Robert Richtmyer, the 
Leader of the Theoretical Division at Los Alamos, describing a possible statistical method to solve the problem 
of neutron diffusion in fissionable material using the newly developed electronic computing techniques. It was 
at that time when Nicholas Metropolis suggested the name Monte Carlo for this statistical method. It was 
related to the fact that Stan had an uncle who would borrow money from relatives because he "just had to go to 
Monte Carlo" f2\ and also because of the similarities between the method and the games of chance abundant 
in the capital of Monaco, the european center of gambUng. 

Very similar methods, not fully developed, had been used earlier. An example is Buffon's needle problem, an 
experiment performed in the of the eighteenth century, which represents one of the first problems in geometric 
probability. It consists in throwing a needle randomly on a board with pai^allel lines, and inferring the value of 
71 from the number of times the needle intersects a line |f3l; nowadays, Buffon's needle problem is practically 
solved by Monte Carlo integration. Descriptions of several modern Monte Carlo techniques appear in a paper 
by Kelvin |4l, written nearly one hundred years ago, in the context of a discussion on the Boltzmann equation. 
In the I940's, Enrico Fermi also used Monte Carlo in the calculation of neutron diffusion, and later designed 
the Fermiac, a Monte Carlo mechanical device used in the calculation of criticality in nuclear reactors 13. 
Ulam's contribution was to recognize the potential for the newly invented electronic computer to automate 
such sampling. 

The approach proposed by von Neumann in his letter was the first formulation of a Monte Carlo computation 
for an electronic machine. Von Neumann considered a spherical core of fissionable material surrounded by 
a shell of normal material, and the idea was to trace out the development of neutrons using random digits to 
select the outcomes of the various interactions along the way, such as scattering, absorption and fission. For 
example, once a neutron is selected to have an initial position with certain velocity, you have to decide the 
position of the first collision and the nature of the collision. If you select a fission to occur, then the number 
of emerging neutrons must be chosen, and each of the new neutrons is followed too. On the other hand, if you 
decide that the outcome of the collision is scattering, the new momentum of the neutron must be determined. 

'Today is quite well known that the chance of winning is low: 3.3% (www.games.solitaire.com) 



If the neutron crosses a material boundary, the characteristics of the new medium must be taken into account. 
At the end, a genealogical history of a neutron emerges. The same procedure is carried out for other neutrons 
until a statistically valid picture is obtained. Each neutron history is analogous to a single game of solitaire, 
and the use of random numbers to make the choices along the way is analogous to the random turn of the card. 

To take decisions, the computer must have an algorithm for generating a uniformly distributed set of random 
numbers and these numbers must be transformed into the nonuniform distribution, say g, desired for the 
property of interest. In a 1947 letter, von Neumann discussed two techniques for using uniform distributions of 
random numbers to generate g. The first technique, which had already been proposed by Ulam, shown that the 
function / needed to achieve this transformation is just the inverse of the nonuniform distribution function, that 
is, / = ^ For example, in the case of neutron physics, the distribution of free paths (how far neutrons of a 
given energy in a given material go before colliding with a nucleus) decreases exponentially with distance. If x 
is uniformly distributed in the open interval (0, 1), then f = — Inx will give us a nonuniform distribution g with 
just those properties. The rest of von Neumann letter describes an alternative technique that works when it is 
difficult or computationally expensive to form the inverse function, which is frequently true when the desired 
function is empirical. In this approach, two uniform and independent distributions {xi) and (j,) are used. If two 
numbers Xi and are selected randomly from the domain and range, respectively, of the function /, then each 
such pair of numbers represents a point in the function's coordinate plane {xi,yi). When yi > f{xi) the point 
lies above the curve for fix), and x, is rejected; when yt < f{xi) the point lies on or below the curve, and x,- is 
accepted (see Fig. Il.lt . Thus the fraction of accepted points is equal to the fraction of the area below the curve. 
In fact, the proportion of points selected that fall in a small interval along the x-axis will be proportional to 
the average height of the curve in that interval, ensuring generation of random numbers that mirror the desired 
distribution yj. 



Reject Xi since yi > fCxJ 



I. /\ L ^1 

\ 




Accept X2 since < fCxJ 



Figure 1.1: Generation of random numbers that mirror a given distribution /(x) fDl . 



The first ambitious test of the Monte Carlo method consisted of nine problems in neutron transport, each one 
corresponding to various configurations of materials, initial distributions of neutrons, and running times. These 
problems did not include hydrodynamic and radiative effects, but complex geometries and realistic neutron- 
velocity spectra were handled easily. Neutron histories were checked with a variety of statistical analyzes and 
comparisons with other approaches. Conclusions about the efficiency of the method were quite favourable and 
gave rise to enthusiasm among scientists of distinct areas. At Los Alamos, the method was quickly adopted 



to study problems of thermonucleai^ and fission devices. Already in 1948, Ulam was able to report to the 
Atomic Energy Commission about the applicability of the method for cosmic rays and in the area of the 
Hamilton Jacobi partial differential equation. Other laboratory staff members started to run Monte Carlo codes 
in ENIAC. Among them, J. Calkin, C. Evans and F. Evans studied thermonuclear problems, and B. Suydam 
and R. Stark tested the concept of artificial viscosity for time-dependent shocks. By midyear 1949, Ulam and 
Metropolis published a paper describing the Monte Carlo method and its application to integro-differential 
equations |6l and the first symposium on the method was held in Los Angeles. 

The construction of a new machine began later and N. Metropolis was the leader of the group in charge of it. 
He called the new machine MANIAC wishing to stop the use of acronyms for machine names, but contrary 
to what he sought, it only stimulated it. In eaiiy 1952, the MANIAC became operational at Los Alamos and 
soon after, Anthony Turkevich led a study of the nuclear cascades resulting from the collision of accelerated 
particles with atomic nuclei. Another computational problem run on the MANIAC was a study of equations of 
state based on the two-dimensional motion of hard spheres. The results were published in a famous paper in 
1953 Q and describes a strategy leading to greater computational efficiency for equilibrium systems obeying 
the Boltzmann distribution function. The idea developed in that is that if a move of a particle in the system 
causes a decrease in the total energy, the new configuration should be accepted. On the other hand, if there is 
an increase in energy, the new configuration is accepted only if it passes through a game of chances biased by 
a Boltzmann factor, otherwise, the old configuration is kept. 

Since then, the Monte Carlo method has been proved to be a very powerful and useful tool. For example, 
deterministic methods for numerical integration of functions with many variables are very inefficient because 
with every additional dimension or variable, an exponential time increase takes place. The alternative way 
provided by the Monte Carlo method is the following: the function in question can be estimated by randomly 
selecting points in the many dimensional space and taking some kind of average of the values of the function at 
these points. This method will display 1 / \fN convergence, i.e. quadrupling the number of sampled points will 
halve the enw, regardless of the number of dimensions. The use of Monte Carlo methods to model physical 
problems allows us to examine more complex systems that otherwise we are not able to handle. Solving 
equations which describe the interactions between two atoms is fairly simple but solving the same equations 
for hundreds or thousands of atoms is impossible. With Monte Carlo methods, a large system can be sampled 
in a number of random configurations, and those data can be used to describe the system as a whole. There are 
currently many applications of the Monte Carlo method: stellar evolution fSl, reactor design {9^, cancer therapy 
iflOJ . traffic flow yjj, finance 1121 . simulations of various systems of interacting particles (e.g. ferromagnetic 
materials), grain growth modeling in metallic alloys |[T3llT4l . the behaviour of nanosti'uctures and polymers 
lITSI . and protein structure predictions fl^ . 

1.2 Basics of the Monte Carlo Method 

In statistical mechanics, the partition function Z{H,T) contains all the necessary information to calculate the 
thermodynamic properties of a system. The difficulty arise when the size of the system and the number of 
degrees of freedom for each particle is large, something that occurs in almost all cases. Then, summing over 
the large number of possible states to calculate Z{H, T) is extremely expensive and almost impossible even in 
a computational way. The result is that, in general, the partition function can not be evaluated exactly il7J . 

The Monte Carlo approach consists of generating a series of possible states or configurations Xi,X2, of 
a system {Xj = {xi,X2...} with Xj being the position of the particles in the system), so that the probability P^, 
of encountering the system in state Xj, is given by an appropriate probability density function. Averages over 
phase space may be constructed by considering a large number of identical systems which are held at the same 
fixed conditions. These are called ensembles, (Fig. I1.2I) . and depending on the pai^ameters held fixed, one can 
have different types of ensembles. In the case in which T is maintained constant, the set of systems obtained 
is said to belong to the canonical ensemble, in which the systems are allowed to have distinct energies. On 
the other hand, if the energy is fixed, the ensemble is called the microcanonical ensemble. In both cases the 



number of particles is also fixed, but if now we allow the number of particles to fluctuate, the ensemble is 
named the grand canonical ensemble lITTl . 




Figure 1.2: Graphical representation of a canonical ensemble: the positions of the particles and the energy can 
change in each system, but the number of particles and the temperature is fixed. 



In the canonical approach, Z{H,T) is calculated in the following way: 

Z{HJ)= £ e-''/'^^ (1.1) 

all states 

where ks is the Boltzmann's constant, H denotes the Hamiltonian and T the temperature of the system. By all 
states we mean taking into the sum all the available configurations for the system. The probability distribution 
is called a canonical distribution if it is given according to the equation: 

^-H(X)/kBT 

(1.2) 



Z{H,T) 

The general goal is to determine equilibrium properties of the canonical ensemble such as energy and 
magnetization. If m{X) is the value of some physical property in a state X, and H{X) the energy of this 
state, then the canonical ensemble average for the quantity m is given by: 

W = £--4^)£^. (13, 

^ ' Z{H,T) 

As mentioned before, the problem is how to calculate Z{H, T) in an efficient way. 

If we have a finite state space St^ , where X(f) is the state of the system at time t, that can only take s discrete 
values X(,) G = {Xi,X2, ...jXi}, the stochastic process is called a Markov chain if the following condition 
is fulfilled: 

where P(X(()|X(f_i), ...,X(i)) is the probability of the state X(f) to occur conditioned by the occurrence of the 
past states X((_i), ...,X(i). is known as the transition probability matrix. The chain is homogeneous if the 
transition probability Tm = Im(X(,)|X(,_i)) is constant for all t, with Y.x^,) TM{^t\^(t-\)) = 1 for any t. That is, 
the evolution of the chain in the state space depends solely on the cun^ent state of the chain and a fixed 
transition (probability) matrix fTSll . 

For any starting point, the chain will converge to an invariant distribution P{X), as long as Tm is a stochastic 
transition matrix with the following properties: 

1. Irreducibility: for any state of the Markov chain, there is a positive probability of visiting all other states. 
That is, the matrix Tm can not be reduced to smaller matrices, which is also the same as stating that the 
transition graph is connected. 



2. Aperiodicity: the chain should not get trapped in cycles fTSll . i.e., the system should not be limited to a 
subchain of states. 

Consider now a large collection of copies of the same system in equilibrium. We allow each copy to evolve in 
time and, at any instant, we will find each different copy in one possible configuration, an all the copies will 
give a probability distribution over the configuration space. For each point X, in the configuration space, the 
probability P of finding a copy in X at time t satisfies the equation: 

jP{^,t) = Y\P{Xi,t)TM{Xi^X)-P{X,t)TM{X^Xi)]. (1.4) 

Tm{^ Xi) and 7m(X, — > X) are the probabilities of making a transition from the configuration X to X, and 
viceversa. Because the collection is in equilibrium, the probability distribution is time invariant, and in the last 
equation we must have dP{X,t)/dt = for all t. At any instant, there is an equal number of transitions to and 
from the configuration X. In fact, there exists an equation like (ll.4t for each point in the configuration space, 
and the set of all such equations forms the master equation fl^ . 

A sufficient (but not necessary) condition for an equilibrium (time independent) probability distribution needed 
to simulate equilibrium systems is the so-called detailed balance condition for the master equation that relates 
the transition between two configurations, and X„ through: 

p(x")rM(x("-i)|xW) =p(x("-i))rM(xW|x("-i)). (i.5) 

This method can be used for any probability distribution of configurations. If we choose the Boltzmann 
distribution, for which the probability of finding a configuration X with energy H at equilibrium is given 
by (II. 2t . and substitute it into (II. 5I) . we get: 

T i'y("-1)| „-H(n-\)lkBT 

rM(X(")|X(«-l)) e-H(n)lksT ■ ^'■'^> 

This is the detailed balance condition on the transition probabilities. It is very important to note that Z{H, T) 
does not appear in this expression; it only involves quantities that we know {kgT) or that can be easily 
calculated (£"). 

Thus, we have a valid Monte Carlo algorithm if we generate a new configuration X(„) from a previous one 
X(„_i ) such that the transition probability satisfies the detailed balance condition, and the generation procedure 
is ergodic, i.e. every configuration can be reached from every other configuration in a finite number of iterations 

EOl. 

1.3 Measurements Using the Monte Carlo Method 

Systems generated using a valid Monte Carlo algorithm are often held at fixed values of intensive variables, 
such as temperature, pressure, and so on. The corresponding conjugate extensive variables (energy, volume, 
etc.) will fluctuate in time; indeed these fluctuations will actually be observed during the Monte Carlo 
simulations and will help us to measure quantities of interest such as: 

Specific heat 
Susceptibility 

x = ^ (If )^ = ^^^^'^ = ^^'^ - ^1-^) 

These and other similar quantities are measured for each configuration and the averages and statistical errors 
calculated fT7\ . 



Summarizing, the idea of Monte Carlo simulations is to create an independently and identically distributed set 
of samples from a target density P{X) distribution function defined on a high dimensional state space ^ 
(e.g., the set of possible configurations of a system). These N samples can be used to approximate P{X) fTSll . 

When P{X) has a standard form, e.g., Gaussian, it is straightforward to sample from it using easily available 
routines. However, when this is not the case, we need to introduce more sophisticated techniques such as 
Mai^kov Chain Monte Carlo (MCMC) briefly presented above, which is a strategy of generating samples using 
a Markov chain mechanism while exploring the state space S'. This mechanism is constructed with the 
condition that the chain spends more time in the most important regions. In particular, it is constructed so that 
the samples mimic samples drawn from the target density distribution P{X) flS\. 

1.4 Ising and Potts Models 

The Ising model was proposed in 1925, in the doctoral thesis of Ernst Ising, a student of Wilhelm Lenz ll2Tl . 
Using a model proposed by Lenz in 1920 Ising tried to explain certain empirically observed facts about 
ferromagnetic materials in his thesis. The model was referred to in a paper by Heisenberg of 1928 in which 
he used the exchange mechanism to describe ferromagnetism L23J . After the publication of a paper by Peierls 
(1936) l24l . in which he gave a non-rigorous proof that spontaneous magnetization must exist, the Ising model 
became a well-established paradigm. In 1941, Kramers and Wannier calculated the Curie temperature using a 
two-dimensional Ising model ll25l and three years later Onsager gave a complete analytic solution of the model 
Mi- 

As a paradigm of statistical mechanics, the Ising model tries to imitate systems in which individual elements 
(e.g., atoms, animals, protein folds, biological membrane, social behaviour, etc.) modify their behaviour so as 
to conform to the dynamics of other elements in their neighbourhood ETl . 

In most specific terms, the Ising model in statistical mechanics considers a system with spins located at the 
sites of a D-dimensional lattice, where each spin can take the value -i-l, corresponding to spin up, or the value 
-1, corresponding to spin down. The Hamiltonian of such a spin lattice system is given by: 

//7 = -7£a,a,-B£cT„ (1.9) 

where / is the exchange constant, and a,- and Oj are the spins of the and sites respectively. The sites are 
usually a pair of nearest neighbours, though calculations for more distant neighbours can also be carried out. 
B is an externally applied magnetic field with whom each spin interacts. 

H 

I r 

Figure 1.3: Lattice representations of Ising and Potts models. The red site interacts with his first neighbours 
(in yellow). Notice that in the Potts model, being a generalization of the Ising model, more than two possible 
directions for the spin are available. 




When / > 0, the model describes a feiTomagnetic system where pai^allel spins aie favoured and antiparallel 
spins are discouraged. 

In the case of / < 0, an antiferromagnetic system is modeled. 

If J is randomly chosen to be 1 or - 1 for each pair of nearest neighbours and remain fixed during the course of 
observation, we obtain a model of a spin glass 1281 . 

The energy associated with each state depends then on the exchange energy of the particles and the interaction 
of the particles with the external magnetic field. However, in the absence of the external field, the energy of 
the system depends only on the spin exchange energy: 

Hi = -jY,OiOj. (1.10) 

iU) 

The Potts model is a generalization of the Ising model, in which spins can choose its value from a discrete set 
of states (see Fig. \l.3l . In 1952, C. Domb proposed it as a doctoral thesis for his student R. Potts ll29l . Without 
the presence of an external field the Potts model is defined through the Hamiltonian: 

= -■/ £ da.^ajOiOj, (1.1 1) 

(iJ) 

where J denotes again the interaction exchange constant between nearest neighbours and the values a, ai^e 
characterized by an integer a, = 1,2, If two spins are parallel they contribute with energy J, otherwise 
their energy contribution is null. 

1.5 Some Monte Carlo Algorithms: Metropolis, Swendsen-Wang and Wolff 

The Metropolis Q, Swendsen-Wang lOUl and Wolff BDl algorithms satisfy the master equation and the 
detailed balance condition for the Boltzmann distribution. Consequently, when the system reaches equilibrium, 
the probability distribution of all possible configurations will be the Boltzmann distribution. 

The steps of the Metropolis algorithm for an Ising model ai^e graphically represented in Fig. 1.7 and are the 
following: 

1. Start with an arbitrary spin configuration Co of a lattice with sites. 

2. Select a spin randomly and independently, and flip it. 

3. Calculate the energy change AE which results if the spin is turned. 

4. Generate a random number r such that < r < 1. 

5. If AE < 0, accept the change; if AE > 0, the configuration is accepted with a probability e-^E/keT ^ j^^^ 
is resumed as: if r < g-^^/^s^ the spin is flipped. If not, the new configuration is rejected, and the system 
returns to the initial configuration Cq. 

6. Choose randomly another spin to flip and go to (3). 

It is important to discard some configurations at the beginning of the chain of configurations to ensure that 
the system forgets Co and that the configurations taken into account form a canonical ensemble. Then, after 
a considerable number of spins have been updated, the properties of the system are determined and added to 
the statistical average which is stored. The random number r must be chosen uniformly in the interval [0, 1] 
and all the successive random numbers should be uncorrelated. Note that if a spin trial is rejected, the old 
state is counted again for the averages. For a q state Potts model, the new value for the chosen spin is selected 
randomly among the other q — I spin values lfT7| . 



Figure 1.4: Metropolis algo- 
rithm: If the energy decreases 
with the spin flip, the new con- 
figuration remains. If not, is ac- 
cepted or rejected with certain 
probability. 



In the Metropolis algorithm, spins are updated one at a time and this single spin flip is the reason why this 
algorithm is inefficient at critical points where the phenomenon of slowing down occurs. The standard measure 
of Monte Carlo time is the Monte Carlo step per site (MCS/site), which corresponds to trial flips, regardless 
of whether the trial is successful or not (A'^ is the total number of spins in the system) |[T9l . 

The Swendsen-Wang and Wolff algorithms are cluster algorithms, where groups of spins are identified by 
establishing bonds between pairs of neighbouring spins. Once the clusters in the lattice aie identified, a whole 
spin cluster is updated, and in this way these algorithms are more efficient near critical points. 

The Swendsen Wang algorithm for a q state Potts model is (Fig. II. 5t : 

1. Initialize the lattice of A'^ sites with an arbitrary spin configuration Cq. 

2. Examine every pair of neighbouring spins in the system. If neighbouring spins are not parallel, nothing 
is done. If they are parallel, a bond is introduced between them with probability p = \ — e^^, where 
K = J/ksT. (If p < I, a random number r is generated such that < r < 1, and if r < p a bond is 
introduced between sites / and j). 

3. Once all clusters in the lattice have been formed, an arbitrary cluster is chosen. 

4. Another random number R is generated such that I < R <q. 

5. All spins in the chosen cluster are assigned a, = R. 

6. Another cluster is selected randomly and return to (4). 

7. When all clusters have been considered, erase the bonds, go to (2) and repeat the steps until the desired 
number of configurations has been obtained. 




i 




Figure 1.5: Swendsen-Wang algorithm: Once the clusters are formed (each one is represented by a diferent 
colour), their spin values are randomly modified. Some clusters maintain the same value (i.e., orange spin). 
After that, the cluster formation starts again. 



One Monte Carlo cycle in the Swendsen-Wang algorithm is accomplished when all clusters have been updated 
(steps 2-6), and is equivalent to one Monte Carlo step per site (MCS/site) in the MetropoUs algorithm i\9i . 

The probability to set a bond between two sites depends on the temperature, which affects the resultant cluster 
distribution. At very high temperature, the clusters will tend to be quite small, whereas at very low temperature 
virtually all sites with nearest neighbours in the same state will belong to the same cluster and therefore there 
will be a tendency for the system to oscillate back and forth between quite structures. However, near a critical 
point, a quite rich array of clusters is produced and the net result is that each configuration differs substantially 
from its previous one. That is the main reason why the critical slowing down is reduced (Vfl . 

The Wolff algorithm is very similar to the Swendsen-Wang algorithm, the principal difference being that it 
flips the spins of one particular cluster with the maximum probability of 1 in each Wolff MC cycle. The 
Wolff algorithm was proposed to improve the Swendsen Wang algorithm in which significant effort is required 
in dealing with small clusters as well as large ones. However, the small clusters do not contribute to the 
critical slowing down (VJl and can be disregarded. The Wolff algorithm is given by the following procedure (a 
graphical representation is provided in Fig. 11.61) : 

1. Start with an arbitrary spin configuration Co of a lattice with A'^ sites. 

2. Randomly choose a spin to be the seed of a cluster. 

3. Examine all its neighbours and draw bonds with probability p = \ — e^^^'^K 

4. If bonds have been drawn to any nearest neighbour site j, draw bonds to all nearest neighbours k of site 
j with probability p = \ — g^^^j^*. 

5. Repeat step (4) until no more new bonds are created. 

6. Flip all spins in the cluster to a different randomly chosen spin value. 

7. Go back to (1). 

The measurement of Monte Carlo time is more complicated. The natural unit of time is the number of cluster 
flips. However, in one cluster flip the number of spins visited is not equivalent to the total number of spins 
in the system and hence one Wolff cluster flip is not equivalent to one MC step per spin (MCS/site) or one 
MC cycle in the Metropolis and Swendsen-Wang algorithms. The generally accepted method of converting to 
MCS/site is to normalize the number of cluster flips by the mean fraction of sites (c) flipped at each step. The 
Monte Carlo time then becomes well defined if (c) is well defined, and this happens only after enough flips 
have occurred (HI- 




Figure 1.6: Wolff algorithm: A spin is chosen randomly, and the cluster is formed from it by introducing 
bonds to its neighbours and the neighbours of its neighbours with some given probability. The spin value of 
the cluster is changed and then another spin is selected to start a new cluster. 

Although all these algorithms satisfy detailed balance, they do not give the same results for M and ;^ in a 
simulation. This difference is due to the very small probability for M to change sign using the Metropolis 



algorithm for large systems, at low temperatures. This con^esponds to a physical situation, and one can 
calculate (M) and and obtain meaningful results. However in cluster algorithms, the clusters become very 
large at low temperatures, and by flipping them, we effectively flip the whole system, yielding (M) = 0; the 
variance in M is then simply (M^), a constant at low temperatures, which in turn gives a diverging Xm as 
r — > 0. The solution is to use \M\ instead of M, and define ;^|^| just as we defined % earlier. In this way, all 
three algorithms give the same results for (M) and X|m| temperatures [19'] . 

Notice that cluster algorithms become inefficient at low temperatures, because in that situation, nearly all 
spins in the system are flipped when we flip the largest cluster, which is not helpful in achieving statistically 
independent configurations. In comparison, the Metropolis algorithm will be much more efficient fTOll . 

Once an appropiate algorithm has been selected, one of the goals of Monte Carlo simulations is the study of 
the behaviour of systems in phase transitions. 

1.6 Phase Transitions and Critical Exponents 

One of the most common physical problems studied in simulations are phase transitions. A phase transition 
occurs when a thermodynamic system passes from one phase to another one with the change of some external 
variable, such as temperature or pressure. Some examples are the transitions between solid, liquid, and gaseous 
phases, the transition between the ferromagnetic and paramagnetic phases of magnetic materials, and the 
emergence superconductivity in certain metals when they are cooled below a critical temperature l32ll . 

When a system goes from one phase to another, there will be in general a stage where the free energy is not 
analytic. Due to this, the free energies on either side of the transition are two different functions, so one or 
more thermodynamic properties will behave very differently after the transition. A system near or at the critical 
point of a phase transition presents peculiar behaviours that are universal, like divergence of some quantities 
and critical slowing down phenomena, which will be explained later. The most commonly examined property 
in this context is the heat capacity that in the transition region may become infinite, jump abruptly to a different 
value, or exhibit a discontinuity in its derivative 1331 . This non-analytic behaviour stems generally from the 
interactions of an extremely large number of particles in a system, and does not show up with the same strength 
in systems that are too small 1321 . 

Phase transitions are generally classified into first or second order transitions. A second order, or continuous 
phase transition, can be defined as a point at which a system changes from one state to another one without a 
discontinuity or jump in its density, internal energy, magnetization, or similar properties. In the case of a first 
order transition, the above mentioned properties jump discontinuously as the temperature or pressure passes 
through the transition point l34l . The name of different kind of phase transitions comes precisely from the 
number of derivatives of the free energy that we have to count before we can see a discontinuous behaviour. If 
the first derivative is discontinuous, we have a first order transition, if not, it is a second order one ITTI . 
The first-order phase transitions involve a latent heat. During such transition, a system either absorbs or releases 
a fixed (typically large) amount of energy. Because energy can not be instantaneously transferred between the 
system and its environment, first-order transitions are associated with "mixed-phase regimes" in which some 
parts of the system have completed the transition and others have not. Continuous phase transitions, in many 
cases, are associated with a change of symmetry of the system and are easier to study than first-order transitions 
due to the absence of latent heat. They have shown many interesting properties. The phenomena associated 
with continuous phase transitions are called critical phenomena, because of their occurrence near critical points 
and because it turns out that continuous phase transitions can be characterized by parameters known as critical 
exponents ll32l . 

In the case of many phase transitions a non-zero value of an order parameter appears, i.e., some property of the 
system which is non zero in one phase (usually called the ordered phase) but identically zero in the other phase 
(disordered phase). Thus, the order parameter can not be an analytic function at the transition point. The order 
parameter is defined differently in various kinds of physical systems fTTll . For systems such as the ferromagnet. 



where there is a broken symmetry below critical temperature Tc, the order pai^ameter is the magnetization. For 
systems without broken symmetry, one chooses some quantity that is very sensitive to the difference between 
the two phases, and measures the difference of this quantity from its value at the critical point and below it. 
For the liquid-vapor critical point, we may choose the order parameter as the difference between the actual 
density of the fluid and the density at the critical point. For liquid crystals the degree of orientational order is 
considered as the order parameter 1341 . 

Another quantity of interest near a phase transition is the correlation function. In general, there will be 
microscopic regions in which the characteristics of the material are correlated. This is generally measured 
through the determination of a two point correlation function, which is the probability of finding that two sites 
separated by a distance r have the same value of a certain given quantity p [17J: 

Tp{r) = {p{Q)p{r)). (1.12) 

In the case of magnetic systems, the correlation function can be measured in neutron scattering experiments, 
whereas near the liquid vapour transition it can be measured by light scattering or small angle X-ray-scattering 
experiments 

If the correlation for the appropriate quantity decays to zero as the distance goes to infinity, then the order 
parameter is zero L17J . Close to the critical point, the correlation length <§, which tells us how far correlations 
are still present, becomes extremely large. This is directly related to the large amount of long-wavelength 
fluctuations that occur in the system at the criticality 1341 . The time taken for the system to change 
configuration near the critical point also increase significantly because of the divergence of the correlation 
length t,. This phenomenon is called critical slowing down. For example, in the case of the Ising model, 
spins tend to align with their neighbours due to the exchange interaction, and regions or clusters of spins 
pointing in the same direction appear. These spins are said to be correlated, and, generally, there are clusters 
of various sizes. The span of the largest one is the correlation length i^, while the time it takes to break up the 
existing conformation of spins and form another aiTangement of clusters is called the decoiTclation time T. At 
the critical point, there is a low probability for a spin in the middle of a spin cluster to change its direction, 
therefore spin regions are altered only at the boundary. This gives rise to a long decorrelation time which is 
related to the correlation length by a power law: 

Toc^S (1.13) 

where z is the dynamical critical exponent fl^ . For simulations of a finite lattice of linear dimension L, is 
naturally bounded by L and then the basic assumption is that: 

TocL^ (1.14) 

These two equations describe the critical slowing down. In an infinite system, as the critical point is 
approached, the coiTclation length diverges (its value is oo), and from (I1.13t . we see that the decorrelation 
time also diverges. In finite systems does not diverge as the critical point is approached, however, it reaches 
its peak with a sharp slope. Due to the power law dependence of r on , T will also display a peak with a sharp 
slope, exhibiting critical slowing down il9l . 

Near the transition points, the critical slowing down phenomenon produces important effects that complicate 
the implementation of the Monte Carlo method. This is the main reason why the scientists introduced 
alternative approaches besides canonical Metropolis algorithm, such as Wolff and Swendsen-Wang algorithms. 
The computational effect of critical slowing down near a critical point can be understood in the following 
manner: when we simulate finite systems at the critical point, the decon^elation time depends on the linear 
dimension L through a power law as L approaches infinity. Take, for example, the 2D Ising model. The 
dynamical critical exponent z is known to be approximately 2 using the Metropolis algorithm. If the time it 
takes to obtain 100 statistically independent configurations is ? in a system with L = 32, then if L is increased 
by a factor of 2 to 64, the computational time needed to obtain 100 statistically independent configurations will 



increase to 4^?. A factor of 4 is introduced because the number of spins is increased by 4, and another factor 
of 4 is due to the fact that T oc L^. In general, the amount of CPU time required to obtain a fixed number of 
statistically independent configurations for a system with Unear dimension L is proportional to L'^+^, where d 
is the spatial dimension of the model, and z is the corresponding dynamical critical exponent fTOll . 

Data from experiments, as well as results for a number of exactly solvable models, show that in the vicinity of 
the critical point T^, the thermodynamical properties can be described by a set of simple power laws M7il . For 
example, for the determination of the way which the magnitude of the order parameter approaches zero as the 
critical point is reached, we may write (according to the classical theories of phase transitions such as the van 
der Waals or mean field theories): 

M = Moe^, (1.15) 

where M is the order parameter (i.e., the magnetization for a ferromagnet), Mq is a constant that will vary from 
one system to another, e = \\ — T /Tc\, and the exponent j8 is called critical exponent 

The temperature variation of the order parameter is very important but not the only quantity of interest. Another 
key quantity is the specific heat, defined as the derivative of the internal energy with respect to the temperature. 
The specific heat is found to become infinite at the critical point in some systems but also one can have cases 
in which the specific heat is finite with only a sharp cusplike maximum at the critical point l34l . In either case, 
one may define an exponent a that characterizes the anomalous behaviour of the specific heat at the critical 
point: 

Cy=Co£"". (1.16) 

Susceptibility % is another quantity of interest. It is defined as the derivative of the order parameter with respect 
to the applied field to which it is coupled, under constant temperature condition. For a magnetic system, this 
quantity is precisely the magnetic susceptibility. This quantity becomes extremely large near the critical point, 
and we may write the zero field magnetic susceptibility as l34ll : 

Z=Zo£"^- (1-17) 

Finally, the correlation length varies as: 

^=^o£-\ (1.18) 

where, again, v is termed as critical exponent. 

Note that the last equations represent asymptotic expressions which are only valid if £ ^ and more complete 
forms would include additional corrections to scaling terms which describe the deviations from the asymptotic 
behaviour. The exact values of these critical exponents are known exactly only for a small number of models, 
most notably for the 2D Ising square lattice ESI , whose exact solution shows that a = 0, j8 = 1/8, and 7=7/4. 
Here, a = corresponds to a logarithmic divergence of the specific heat fTTll . 

The power law behaviour near critical points is very general and many systems share the same critical 
exponents. In particular, the Ising universahty class refers to the class of critical phenomena that share the 
same critical exponents as the Ising model 1191 . 

Although the critical exponents, a, j3, and / defined above may be independent in principle, they were found 
empirically, in the 1960's, to be connected by the relationship: 

a = 2-7-2j8. (1.19) 

This equality is known as the Rushbrooke relation, and the following three relations are also known lITTI . where 
t] and 8 are two additional critical exponents: 

Josephson: vD = 2 — a, 
Widom: y = j3(5-l). 

Fisher: 7 = v(2 — r\). 



In Table 1.1 we provide the theoretical values of the critical exponents for ^ < 4 2D Potts model, which of 
course fulfills the latter relations. 
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Table 1.1: Some theoretical critical exponents for the 2D Potts model l35l . 



The quantities discussed above are all equilibrium or static quantities; they can be measured in a time- 
independent experiment in thermal equilibrium conditions, and any involved correlation function refers to 
the correlation of fluctuations at a single instant of time. The majority of theoretical studies and experiments 
on critical phenomena are concerned with these static measurements. Thus, the usual division of systems into 
different universality classes is based on these static phenomena. There are other properties of systems, known 
as dynamical properties, which require a more detailed theoretical analysis. Moreover, they require a further 
subdivision of the universality classes. Two systems that belong to the same universality class for their static 
properties could show quite different behaviours in their dynamical properties. Some standard examples of 
dynamical properties are various relaxation rates of systems slightly disturbed from equilibrium, coiTclations 
involving fluctuations at two different time instants, and transport coefficients, e.g., thermal and electrical 
conductivities. Among the experiments used for studying dynamical properties we quote measurements 
of sound-wave attenuation and dispersion, widths of nuclear and electron magnetic resonance lines, and 
inelastic scattering experiments. Typically, one finds that the relaxation rate of the order parameter becomes 
anomalously slow at a critical point. However, some other relaxation rates are found to speed up and transport 
coefficients become large in a number of cases. In some cases, the results of a dynamical experiment may be 
interpreted as an indirect measurement of a static property of the system. As a matter of fact, some of the most 
precise measurements of static critical properties have been obtained by dynamical means. Examples ai^e the 
measurements of the superfluid properties of liquid helium, the low-frequency sound velocity of a fluid, and 
the frequency of nuclear magnetic resonance in a magnetic system l34l . 



1.7 The Histogram Method 

The canonical Metropolis algorithm yields mean values of various thermodynamical quantities, (energy, 
magnetization, etc) at particular values of the temperature T . Near a phase transition, many thermodynamical 
quantities change rapidly, and we need to determine these quantities at closely spaced values of T . If we use 
standard Monte Carlo methods, we will have to do many simulations to cover the desired T range 061. The 
use of histograms to overcome this problem became popular after the publication of a paper by FeiTcnberg 
and Swendsen in 1988 lITTI . However, the histogram technique is one of the oldest techniques proposed 
ll38ll39l . Also often referred to as Ferrenberg-Swendsen reweighting technique, is used in almost all Monte 
Carlo calculations of statistical physics, especially when dealing with phase transition phenomena f^. The 
idea is to use the knowledge of the equilibrium probability distribution at one value of T (and other external 
pai^ameters) to estimate the desired thermodynamical averages at neighbouring values. 

A Monte Cai4o simulation performed at T = To generates configurations of the system with a frequency 
proportional to the Boltzmann weight, e^^'^, where jSq = I/^bTq, and H is the Hamiltonian of the system 
being studied. In the case of a magnetic system, the probability of simultaneously observing the system with 



energy E and magnetization M is given by: 



P^[E,M) = -^W{E,M)e-^\ (1-20) 

where W{E,M) is the number of configurations (density of states) with energy E and magnetization M, and 
Z(j8o) is the partition function of the system. Because the simulation generates configurations according to 
the equilibrium probability distribution, a histogram H{E,M) can be built during the simulation to provide an 
estimate for the equilibrium probability distribution that becomes exact in the limit of infinite-length run. For a 
finite length-simulation, the histogram will present statistical eiTors, but H{E,M) /N, where A'^ is the number of 
measurements, still provides an estimate of Ppg{E,M) over the E and M values generated during the simulation 
II4TI . Keeping this in mind, we modify (11.201) as follows: 

H{E,M) = -J^W{E,M)e-^^, (1.21) 

where W{E,M) is an estimate of the true density of states, or number of configurations, W{E,M). 
The probability distribution for any value of j(3 has the same form as ( 11.201 : 



-I5E 



Pli{E,M) = ^W{E,M)e 
Comparing (I1.2U and (ll.22t . we can note that it is possible to determine W{E,M) from (I1.2U : 



(1.22) 



W{E,M) 



N 



H{E,M)e^'^, 



(1.23) 



and replace W(£',M) in M.22\ with it. After normalizing the distribution, we find that the relationship between 
the histogram measured at j8 = j3o and the (estimated) probabihty distribution for an aibitiary j8 is: 



From Pp{E,M), the average value of any function f{E,M) can be calculated as a continuous function of j8: 



{f{E,M))p = '£f{E,M)PpiE,M). (1.25) 

E,M 

The histogram method is useful only when the configurations relevant to the range of temperatures of interest 
occur with sufficient probability during the simulation at temperature Tq. For example, if we simulate an 
Ising model at low temperatures at which only ordered configurations occur (most spins aligned in the 
same direction), we can not use the histogram method to obtain meaningful thermodynamical averages at 
temperatures for which most configurations are disordered, and viceversa 1361 . 

In the single histogram technique, the estimated P{E,I5) is accurate only for j8 close to the reference value 
jSq. By generating many histograms that overlap each other we can widen the range of j3. This is called the 
multiple histogram technique \42\. It is also clear that we can increase the range of j3 by directly estimating 
the density of states W{E,M). Multicanonical sampling 1431 is an early technique proposed to do this. It is a 
very general and useful technique being often the method of first choice for a variety of problems that include 
critical slowing down near second order phase transition points, nucleation in first order phase transitions, and 
trapping in the metastable minima in systems with rugged energy landscapes. 



1.8 Identifying the Nature of Transitions and Finite Size Scaling 



The behaviour near phase transitions has been one of the main objectives of studies focusing on the properties 
of physical systems but a correlation length greater than the accessible size L of the system may lead to 
many difficulties ll44l . For systems close to a second order phase transition, finite-size scaling is routinely 
used to extract thermodynamic information from similar systems of fairly small size. An equivalent theory for 
first order phase transitions is clearly also of interest. A useful theory of finite-size scaling should allow us to 
extract the couplings at which the transition occurs, as well as other dimensional quantities like latent heat (or 
spontaneous magnetization) and specific heat (or magnetic susceptibiUty) 1451 . 

First order transitions are characterized by a discontinuity in the order parameter and thermodynamic 
quantities, with an associated delta-peak behaviour in the susceptibility. As a matter of fact, the jump in the 
energy density is equivalent to the latent heat. However, at finite size, thermodynamic quantities become 
continuous and rounded. Instead of delta function behaviour in susceptibility there is only a hump. In 
simulations, this behaviour is visible only if the simulation time is larger than the decorrelation time t 
at the transition point. is typically very large since T oc e^^^^ , where a is the surface tension of the 
interface between the low temperature and high temperature phases l47l . It is the dimension D that now plays 
the key role rather than the critical exponents as in the case of second order phase transitions 1 17|. 

At the transition temperature of a first-order phase transition, a mixed state can exist where two different bulk 
phases ai^e separated by an interface. The free energy densities of the two bulk phases ai^e equal and the free 
energy of the mixed state is higher than any of the coexisting pure phases by an amount Fg = aA, where A 
is the area of the interface and a is the interface tension 1148 ll . In first order phase transitions, the correlation 
length remains finite in both the ordered and disordered phases, i.e., the con^elation length does not diverge. 
Thus, a different approach to finite size scaling must be used lITTI . 

From fairly general ai^guments about the nature of discontinuities at a first-order phase transition. Fisher and 
Berker obtained the infinite volume limit approached by measurements performed at finite volumes. This 
conventional scenario is based on a smooth behaviour of the renormalization group flow and the existence of a 
discontinuity fixed point whose attraction domain contains the transition surface and has relevant exponents of 
the form 3; =D EH. The singularities associated with first order transitions are generated by infinite iterations 
of renormalization group transformations in the thermodynamic limit. Correction terms were later calculated in 
a particular phenomenological model called the double-Gaussian model, in which the peaks in the probability 

distribution for the coexisting phases were approximated by Gaussians [50 51 J. This model correctly predicts 

the first term in a series of corrections in inverse powers of the volume V, around the leading term obtained by 
Fisher and Berker B9l . 

More recent developments are due to Borgs, Kotecky and Miracle-Sole ll52l l53l . The basic idea is to 
decompose the partition function into a sum of the contributions, each due to one of the coexisting phases, and 
to neglect contributions due to phase mixtures. Each of these contributions to the total partition function then 
yield quantities related to free energies in the pure phases. The analysis proceeds by power series expansions 
of these partial partition functions around the phase transition point, leading to moments expressed in inverse 
powers of the volume [45 1- According to this theory, for periodic boundary conditions, the specific heats and 
Binder cumulants at the transition temperature can be represented by polynomials in If the L » ^, 

the contribution of the higher order terms are negligible 1551 . The difficulty arises when ^ > L. In this 
case, higher order corrections are necessary and deciding the order of the transition becomes difficult. Even 
when large lattices are used, higher order terms may create difficulties during the fitting procedure to the 
simulation data. Such difficulties may be reduced by choosing the quantities for which the coiTcction terms 
play less important role. A good example for such quantity is the average energy measured at the infinite lattice 
transition point, which has exponentially small correction term enabling one to determine the infinite lattice 
critical point with great accuracy I53ll54ll56ll . 

Finite size scaling ideas for first or second order transitions help to extract critical exponents and other 



information, but tliis requires prior knowledge of at least the nature of the transition. When the system 
undergoes a weak first order transition with ^ » L, it becomes very difficult to identify its nature even 
with large-scale computations. This problem is even worse when one encounters a system for which nothing 
is known l44ll57l . 

Lee and Kosterlitz 11571 proposed a method which exploits the finite size scaling properties of the free energy 
AF{L). These properties ai^e unambiguous even when ^ »L and, more importantly, can be implemented with 
reasonable computational effort. This method depends on two key ideas: the identification of AF{L), which 
has a characteristic behaviour as a function of L at a first or second order transition or in a single phase region, 
and the usage of histograms enabling this to be computed accurately. They have shown that the positions of 
the peak free energies in a histogram should scale as 1/L if the system is well into the first order region. The 
ratio of P{E) at its peaks and minimum can be used to estimate an interface free energy AF{L), signaUng a 
first order transition if it increases with system size L. 

This method uses the Helmholtz free energy F of a system. At low T, the low energy configurations dominate 
the contributions to the partition function Z, even though there are relatively few such configurations. At high 
T, the number of disordered configurations with high E is large, and hence high energy configurations have a 
big contribution to Z. These considerations suggest that it is useful to define a restricted free energy Fr{E) that 
includes only the main configurations at a particular energy E: 

FriE) = -kT[lng{E)]e-^/'^. (1.26) 

For systems with a first-order phase transition, a plot of Fr{E) versus E will show two local minima 
coiTcsponding to configurations that are characteristic of the high and low temperature phases. At low T, the 
minimum at the lower energy will be the absolute minimum, whereas at high T the higher energy minimum 
will be the absolute minimum of Fi-E. At the transition temperature, the two minima will have the same value 
of Fr{E). For systems with no transition in the thermodynamical limit, there will only be one minimum for 
all T . How will Fr{E) behave for the relatively small lattices that we can simulate? In systems with first- 
order transitions, the difference between low and high temperature phases will become more pronounced as 
the system size is increased. If the transition is continuous, there are domains at all sizes, and we expect that 
the behaviour of Fr{E) will not change significantly while increasing the size. If there is no transition, there 
might be a fake double minima for small systems that disappear for larger systems l36l . Lee and Kosterlitz 
proposed the following method to classify phase transitions: 

1. Perform a simulation at a temperature close to the suspected transition temperature and calculate H{E). 
Usually, the temperature at which the peak in the specific heat occurs is chosen as the simulation 
temperature. 

2. Make use of the histogram method to calculate /%-(£") oc — lnHQ{E) + (j8 — I5q)E at neighbouring values 
of r. If there are two minima in F,-{E), vary )8 until the values of F,-{E) at the two minima are equal. 
The corresponding temperature is an estimate of the possible transition temperature . 

3. The difference between the maxima and the minimum between the two peaks is used to estimate the free 
energy barrier AFr{E) at T^. 

4. Repeat steps (1-3) for larger systems. If ,-(£") increases with size, the transition is first order. If AFr{E) 
remains the same, the transition is continuous. If AFr{E) decreases and goes to zero with size, there is 
no thermodynamic transition. 

The above procedure is applicable when the phase transition occurs by varying the temperature. Transitions 
also can occur by varying the pressure or the magnetic field. These field-driven transitions can be tested by a 
similar method. For example, consider the Ising model in a magnetic field at temperatures below T^. As we 
vary the magnetic field from positive to negative values, there is a transition from a phase with magnetization 



M > to a phase with M < 0. Is this a first-order or continuous transition? To answer this question, we can 
use the Lee-KosterUtz method with a histogram H{E,M) generated at zero magnetic field, and calculate Fr{M) 
instead of F,.{E). The quantity Fr{M) is proportional to — ln^£//(£',M)e^('^^'^^^. Because the states with 
positive and negative magnetization are equally likely to occur for zero magnetic field, we should see a double 
minima structure for Fr{M) with equal minima. As we increase the size of the system, Af,- should increase for 
a first order transition and remain the same for a continuous transition [36). 

Another way to determine the nature of a first order phase transition is to use the Binder cumulant of energy 
defined by El: 

If various cumulants (each one corresponding to different lattice sizes) are plot in the same graph, a behaviour 
chai^acteristic of a first order transition appeai^s as will be discussed in the next section. 
It can be shown that the minimum value of Ui is 

f/.,.. = 3-3(^)+0(L-), (1.28) 

where £"+ and are the energies of the two phases in a first order transition. These results are derived by 
considering the distribution of energy values to be a sum of Gaussians about each phase at the transition point, 
which become shaiper and sharper as L ^ oo |36l . 



On the other hand, equations dl.lSt to dl.lSt for second order transitions are valid only for infinite systems 
and, as a matter of fact, we can simulate only finite systems. Quantities that diverge in the infinite case now 
present peaks in the finite system. Furthermore, the peaks occur at a value Tc{L), for a given linear dimension 
L, slightly different from the infinite-lattice critical temperature Tc. However, at a second order phase change, 
the critical behaviour of a system in the thermodynamical limit can be extracted from the properties of finite 
systems by examining the size dependence of the singular part of the free energy density. This finite size 
scaling approach was first developed by Fisher f59| . According to his theory, the free energy of a system of 
linear dimension L is described by the scaling ansatz: 

F{L,T,h)=L-^^-"^/''F\tL^/\ h&+^^l^), (1.29) 

where t = {T — T^)/Tc, h is the magnetic field and F° is a scaling function. The critical exponents a, j3, 7, and 
V all correspond to the values for the infinite system. Appropriate differentiation of the free energy yields the 
various thermodynamic properties with their corresponding scaling forms: 

C = L«/^CO;C(, (1.30) 



where Xt = tL^/^ is the temperature scaling variable |4T]| . 

To determine the transition temperature accurately one find the location of the peak in a thermodynamic 
derivative, for example, specific heat. For a finite lattice the peak occurs at the temperature where the scaling 
function Z*'(x,) is maximum, i.e., when 



dZ^{xt) 



dxf 



= 0. 

x,=x* 



This temperature is the finite lattice (or effective) transition temperature Tc{L), defined through the condition 
Xt = Xj to vary with the lattice size, asymptotically, as: 



Ul) = t, + t,x;l-^/\ 



These results for the scahng of thermodynamic quantities and Tc{L) are vahd only for sufficiently lai^ge L 
and temperatures close to Tc. Corrections to finite size scaling must be taken into account for smaller systems. 
These are introduced as power law corrections with an exponent — w, such that, for example, the magnetization 
at Tc would scale with system size like L^^/^{ \ +cL^^'). As we move away from T^, corrections to scaling 
due to in^elevant scaling fields, or nonlineaiities in the scaling variables must be introduced. CoiTcctions due 
to irrelevant fields are expressed in terms of an exponent 6 leading to additional terms like ait^ +a2t^^ + 
while nonlinearities in the scaling variables give rise to corrections terms of the form b\t^ +b2t^ + l4Tll . 

If we take one correction term into account, the estimate for Tc{L) is then modified in terms of the coupling 
K = J /ksT as follows: 

K,{L)=K, + XL-^/''{l+bL-''-). 

Before this equation can be used to determine Kc, it is necessary to have an accurate estimate for v and accurate 
values for Kc{L). 

It has traditionally been difficult to determine v from Monte Carlo simulation data because of a lack of 
quantities which provide a direct measurement. This situation was greatly improved by Binder's introduction 
of the fourth order magnetization cumulant U l58l defined by: 



U = l-^, (1.31) 



where m is the magnetization per spin. Binder showed that the slope of the cumulant at Kc, or anywhere in 
the finite size scaling region, varies with system size like L'/^. In particular, the maximum value of the slope 
scales as L^/^. If we take into account a correction to scaling term, the size dependence of the peak becomes: 

^|,„,, = «lV^(i+z,l--). 

The location of the maximum slope of U also serves as an estimate for an effective transition coupling which 
can be used to determine Kc. In the same paper. Binder introduced the cumulant crossing method which extracts 
a transition temperature by examining the behaviour of the magnetization cumulant for different lattice sizes. 

Additional estimates for v can also be obtained by considering the logarithmic derivative of any power of the 
magnetization, which has the same scaling properties as the cumulant slope. The location of the maximum 
slope also provides an additional Kc{L): 

^InK) = O^^K) 

To this end, the methods of finite size scaling are very helpful to determine the behaviour of infinite systems 
from data obtained on finite systems. 



1.9 Monte Carlo Simulations on the Betts Lattice 

Research of properties of lattices distinct from the commonly studied ones (square, triangular lattice) is a 
key step in the development and prediction of the behaviour of possible new materials. A different lattice 
proposed by Donald Betts is constructed removing 1\7 of the sites in a two dimensional triangular lattice l65l . 
accomplishing that each vertex has a coordination number of five and yielding another translationally invariant 
lattice (see Fig ]1.7t . This structure is known as Betts or Maple Leaf lattice, and lies between the kagome and 
triangular ones, which have coordination numbers of four and six, respectively. It has a hexagonal unit cell of 
six sites and fifteen bonds, it is invariant under rotations through multiples of 60°, and, contrary to the kagome 
and honeycomb lattices, it has no inversion symmetry 1551 . To study the critical behaviour of this lattice, we 



performed Monte Cai^lo simulations using the Potts model for <7 = 3, ^ = 4 and q = 5. 



Figure 1.7: Maple Leaf lattice 



For the q-Potts model, the magnetization is defined as follows: 

m = — — — — , (1.33) 
1-1/^ 

where Nmax is the maximum number of equally oriented spins for certain configuration. We denote the lineal 
size of the system studied as L, and this is related to the number of sites as nsit = L x L x 6. Earlier work 
has been already done on this lattice for q = 3, using the Metropolis algorithm, by Wang and Southern WJi . 
We applied Wolff algorithm instead, due to its proved better performance, and obtained similar results for 
ferromagnetic and antiferromagnetic cases. As predicted, calculations shown a second order transition for the 
fenomagnetic case and a first order transition for the antifeiTomagnetic case. For q = 4 and ^ = 5 there is no 
published work. We focus on the ferromagnetic regime in which the transition is found to be of second order 
for ^ = 4 and of first order for g = 5. In the latter case, the transition is very weak and more calculations are 
needed to obtain better results. 



1.9.1 ^ = 3, 7 < 0: Antiferromagnetic Case 

We selected four lattice sizes L = 12, 18, 24 and 36 to perform Monte Carlo simulations. The number of 
Monte Carlo steps used to equilibrate the system before making the average was of the order of 2 x 10^, and 
the number of steps used for averaging was 6 x 10^. Binder cumulants of the order parameter £ as a function 
of temperature for all lattice sizes demonstrate that the system undergoes a first order transition, as each curve 
shown a deep minima whose value moves to lower temperature regions (Fig. II. 8t . The critical temperature is 
obtained from the deep minimums showed by all curves, and its near Tc = 0.444. 

In Fig. 11.91 specific heats for each lattice size are plotted. There, the lattice size effect on the results can bee 
cleai^ly seen: the peaks are sharper and moves toward smaller temperatures at larger lattice sizes. The transition 
temperature can be estimated as the temperature where the peaks have their maximum values, and obviously, 
the best approximation is obtained for the largest lattice size. 
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Figure 1.8: Energy cumulants suggesting a first order phase transition for ^ = 3, / < 0, = 0.444 and four 
lattice sizes. 




Realizing that the phase transition appears to be of first order, the next step is to calculate the energy distribution 
histograms P{E) for various lattice sizes near the estimated critical temperature. We used 1 x 10^ steps to 
equilibrate the system and 4 x 10^ steps for averaging. The histograms always present two well-defined peaks, 
and while increasing L, the minimum between the peaks becomes deeper. Moreover, the histograms are sharper 
when more sites are taken into account (see Fig. 11.101) . As explained in section 1.8, this is typical for first order 
phase transition, confirming the nature of the transition for this case. 




E E 

Figure 1.10: Energy histograms for lattice sizes a) L = 12, b) L = 18, c) L = 24, and d) L = 36 for ^ = 3, 7 < 0. 

The results shown in the present subsection correspond well with the values reported by Wang and Southern 
l67i . The transition temperature reported by them is Tc = 0.445 and their histograms present a behaviour 
identical to ours. 



1.9.2 ^ = 3, 7 > 0: Ferromagnetic Case 



In this case, the used lattice sizes are L = 18, 24, 30, 36, 48, 54 and 60. We considered a larger number of lattice 
sizes in order to have more points available to estimate the critical exponents. The number of Monte Carlo 
steps used to thermalize was 2 x 10^, and the number of steps for averaging was 6 x 10^. Binder cumulants 
of the order parameter m as a function of temperature for the various L values demonstrated that the system 
undergoes a second order transition. This is presented in Fig. 1.14. The critical temperature is obtained from 
the intersection of all curves, each curve corresponding to a distinct lattice size. The obtained value for the 
critical temperature is = 1.2275. In Fig. 1. 15, specific heats for different values of L are shown. 




We used finite size scaling techniques (see section 1.9) to calculate the critical exponents. To obtain v, for 
example, we calculated the logarithmic derivative of the magnetization in a range near the critical temperature 
for all lattice sizes selected, and the maximum value obtained for each curve was plotted against lattice size in 
a log-log plot. A line was fitted to these points, and its slope gave an estimate of the value of 1/v. Fig. 1.16 
illustrates the procedure. It is important to note that logarithmic derivatives of higher orders of magnetization 
can be also used to obtain estimations of 1/v. 



To calculate a, the quantities plotted as functions of lattice sizes are the maximum values of specific heat Cy. 
Again, a linear fit gives the value of a/v, from which a can be estimated using the value of v obtained earlier. 
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Figure 1.13: Values of the logarithmic derivatives of the magnetization for different sizes of Betts lattice versus 
the logarithm of L. The slope of the fitted line y gives the value of v for <7 = 3. 



The data and the linear fit are shown in Fig. 1.17. 
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• IVIC results 
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Figure 1.14: Log-log plot of the maximum values of Cy for distinct sizes of Betts lattices. The fit gives the 
value of a/ V for ^ = 3. 



The critical exponent j3 is extracted from the magnetization values at the critical temperature suggested by the 
Binder cumulant of magnetization. The logarithm of these values (remember that each value corresponds to a 
lattice size) are plotted versus the logarithm of L, and the slope of the line fitting the data corresponds to — jS/v. 
If instead, the maximum values of the susceptibility are plotted versus L, the critical exponent 7 is obtained 
using the same procedure, (see Figs. 1.18 and 1.19). 
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• MC results 
— linear fit 

y = - 0.11 888X- 0.01 8635 
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Figure 1.15: Logarithms of magnetization at the value suggested by the magnetization cumulant versus 
logarithms of L values. The fit gives — /3 /v for ^ = 3. 
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• MC results 
— linear fit 

y = 1 .7629 - 0.48606 
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Figure 1.16: Logarithms of susceptibility at the critical temperature versus logarithms of different linear sizes 
L. The fit gives 7/ v for ^ = 3. 



One of the common procedures used to obtain the transition temperature consists in plotting the temperature at 
which the logarithmic derivatives of the magnetization for each lattice size have their maxima versus L^^/^ . A 
line is fit to the data and the intersection of this line with the 3'-axis gives an approximate of the true transition 
temperature. This can be seen in Fig. ll.nl from which one gets Tc = 1.22676. 
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• MC results 
— linear fit 

y = 1 .0376X + 1 .2268 
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Figure 1.17: Estimation of the transition temperature = 3, 7 > 0). 



In the next table, the critical exponents calculated by Wang and Southern 1671 . the values obtained in this work, 
and the theoretical values are summarized. The values obtained with the Monte Carlo simulations agree well 
to the 2D Potts classical values, but are not perfectly equal. This can be due to numerical enws, lattice size 
effects and also because the Betts lattice can be seen as a triangular lattice with a large number of defects. 
Something that is not so clear to us is why values obtained with the Wolff algorithm are less similar to the 
universal values than those calculated by Wang and Southern. 





Wang & Southern Results 


Our Results 


Theory 


a/v 


0.42 ±0.04 


0.464068 ± 0.00479 


0.4 


15/v 


0.132 ±0.002 


0.1 18885 ± 0.000203 


0.13333 


Y/v 


1.74 ±0.05 


1.76294 ±0.01072 


1.73333 


1/v 


1.19 ±0.02 


1.20723 ±0.004891 


1.2 



Table 1.2: Comparison of the reported critical exponent values with the universal values predicted for the q = 3 
2D-Potts model. 



1.9.3 q = 4, J > 0: Ferromagnetic Case 



The used lattice sizes are once again in the range L = 18 to L = 60. The number of Monte Carlo steps used 
to thermalize is 2 x 10^, and the number of steps for averaging is 6 x 10^. The Binder cumulant of the order 
parameter m as a function of temperature shows that the system undergoes a second order transition, and it 
is displayed in Fig. 1.21. The critical temperature is obtained from the intersection of all curves, each curve 
corresponding to a distinct lattice size, and is near Tc = 1.126. In Figs. 1.22 and 1.23, specific heats and 
susceptibilities for different values of L are shown. 



Figure 1.18: The Binder magnetiza- 
tion cumulant for ^ = 4, 7 > 0. 



Figure 1.19: Specific heats for dif- 
ferent lattice sizes = 4, / > 0). 



Figure 1.20: Susceptibilities for 
different lattice sizes (^7 = 4, / > 0). 




kT 



The critical exponents were obtained with the same procedures explained for q = 3. Different thermodynamic 
quantities are calculated for each lattice size, and the values near critical temperature are plotted against linear 
size in various log-log plots. A line is fit to the data and its slope is representative of some critical exponent, 
depending on which thermodynamic quantity was selected to be plotted (Fig. 1.24 to Fig. 1.27). The critical 
temperature is estimated in the same way explained eaiiier, and is shown in Fig. 1.28. 



Figure 1.21: Values of the 
logarithmic derivatives of 
magnetization for different 
sizes of Betts lattice, versus 
logarithm of L. The fit gives 
the value of 1 / V for q = 4, 
J>0. 



Figure 1.22: Logarithm of 
the maximum value of Cy 
for different sizes of Betts 
lattice, versus logarithm of L. 
The fit gives the value of a/v 
for ^ = 4, 7 > 0. 
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• MC results 
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y = 1 .3852X + 0.38994 
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• MC results - 
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y = 0.64454 + 0.87703 
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• MC results 
— linear fit 
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• MC results 
— linear fit 
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Figure 1.23: Logarithms of 
tlie magnetization values at 
the critical temperature of 
various lattice sizes versus 
logarithms of L. The fit gives 
-j3/vfor^ = 4,7>0. 



Figure 1.24: Logarithms of 
the susceptibilities at the 
critical temperature of vari- 
ous lattice sizes versus loga- 
rithms of L. The fit gives 7/ v 
for g = 4, / > 



Figure 1.25: Estimation of 
the critical temperature for 
^ = 4, 7 > 0. 



The values obtained for the critical exponents are summaiized in the next table, along with the theoretical 
values expected. The value obtained for jS is not near the expected value, and this can be due to an effect of 
magnetic frustration on the lattice. More calculations need to be done in the future, with larger lattice sizes, 
however it could be that this lattice does not belong to an universal group. 





Our Results 


Theory 


a/v 


0.644537 ±0.03695 


1 


P/v 


0.0562526 ±0.004001 


0.125 


Y/v 


1.75564 ±0.02506 


1.75 


1/v 


1.38519±0.03172 


1.5 



Table 1.3: Comparison of critical exponent values obtained by us with the universal values predicted 
theoretically for ^ = 4 2D-Potts model. 



1.9.4 q = 5, J > 0: Ferromagnetic Case 

The lattice sizes considered once again are in the range L = 18 to L = 60. The number of Monte Carlo steps 
used to thermalize is 2 x 10^, and the number of steps for averaging is 6 x 10^. The Binder cumulant of the 
order parameter £ as a function of temperature suggests that the system undergoes a first order transition, and 
it is presented in Fig. 1.29. The critical temperature is obtained from the deep of all curves, each curve corre- 
sponding to a different lattice size, and is neai^ Tc = 1.0575. Specific heats are shown in Fig. 1.30. 

For the calculation of energy histograms, the number of Monte Cai^lo steps used for thermalization is 1 x 10^, 
and the number of steps for averaging varies from 3 x 10^ for lattice sizes until L = 36, and 4 x 10^ for the 
next lattice sizes. The results are shown in Fig. 1.31. 

Energy histograms confirm that the transition is first order. The histograms are narrow at increased lattice size, 
and the valleys also become deeper. 
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Figure 1.26: Energy cumulant for ^ = 5, 7 > 0. The transition temperature is near 1.0575. 




Figure 1.27: Specific heats for <7 = 5, / > 0. 
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Figure 1.28: Energy histograms for 
lattice sizes a) L = 18, b) L = 24, c) 
L = 30, d) L = 36, e) L = 48, f) L = 54 
and g) L = 60. 



1.9.5 Conclusion 



The results obtained with Monte Cai^lo simulations and finite size scaling techniques show clearly the kind of 
transition for each of the cases presented. The calculated critical exponents were near the theoretical values 
for second order phase transitions, except for the exponent j3 in the case q = 4 that requires a more detailed 
analysis. Remember that for / > the transition is supposed to be of second order for ^ < 4 and of first order 
for q > 4. As it lies at the border between the two, the case ^7 = 4 is difficult to assess. Another aspect that 
must be taken into account for further analysis is the type of lattice, because it is quite probable that magnetic 
frustration effects could modify the magnetization-related critical coefficients. For first order transitions, the 
values of the free energy barriers could be estimated from the difference between the two peaks and the valley. 

As we mentioned at the end of section 1.1, Monte Carlo simulations are a helpful tool in other areas. Thus, 
in the next chapter we will move on and review a cluster identification technique that involves Monte Carlo 
calculations for the analysis of biological data. 
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2.1 Proteins, DNA and Gene Expression 



Proteins are the complex molecules that make life possible. Keratin, essential in the structural conformation 
of our hair and nails, is one among the many proteins used as supporting materials in biology. Cells are also 
made of proteins, and enzymes, which are responsible for all chemical reactions inside living organisms, are 
proteins too. The information to produce the sequence of amino acids conforming certain protein resides in 
the DNA molecule, making it of great importance for life. 

A single strand of DNA is formed by unities named nucleotides. Each nucleotide is composed by deoxyribose 
(a sugar formed by five carbons), a phosphate group, and one of the four possible nitrogenous bases: adenine 
(A), cytosine (C), guanine (G) and thymine (G). Phosphodiester bridges link the phosphate group of a 
nucleotide and the sugar of the next one, building in this way a chain of nucleotides. The DNA molecule 
is formed when two of these linear chains are joined by hydrogen bonds connecting the nitrogenous bases 
standing out of the sugar-phosphate backbone of each chain. All this chemical construction has a double helix 
structure envisioned in 1952 by Cricks and Watson fT\. The diameter of the helix is of 2 nm, and adjacent 
bases are sepai^ated by 0.34 nm along the helix axis. Hence, the helix repeats itself every 10 residues on each 
chain at intervals of 3.4 nm. Only specific pairs of bases can form hydrogen bonds: the purine base A always 
pairs with the pyrimidine base T through two hydrogen bonds, and the other pyrimidine base C always pairs 
with the purine base G this time by three hydrogen bonds A possible explanation for this situation is that 
two purines require more than 2 nm for connection, which does not fit within the diameter of the helix. On the 
other hand, there is too much space for two pyrimidines to get close enough to form hydrogen bonds between 
them |I3|- The rules of base pairing tell us that if we can "read" the sequence of nucleotides on one strand 
of DNA, we can immediately infer the complementary sequence on the other strand. Thus, DNA looks like 
a chemical code based on four letters, each one con^esponding to the four nitrogenous bases, aligned along a 
double-helicoidal chain. At the same time, the genetic code is a universal translation table formed by three- 
bases words called codons. Each codon codifies for a specific amino acid, so different sequences of codons 
build distinct proteins, and they can also build RNA molecules with a functional role. One can obtain 64 
distinct combinations mixing the four nitrogenous bases in clusters of three letters. This quantity is enough to 
code for the 20 amino acids forming all proteins, and, as a matter of fact, almost all amino acids have more 
than one codon to codify them. Three of these triplets are left apart to code for chain termination to release 
proteins at the end of their production process. Moreover, one triplet is left as a signal to start the synthesis 
process. A segment of DNA sequence with the instructions to codify for a functional product, protein or RNA, 
is named a gene, and the genome is the collection of all the "recipes" for the products that an organism needs. 
The genome of a simple organism such as yeast has around 7,000 genes and the latest estimate for the human 
genome is of 25,000 genes |4||. It is important to stress that not all chain segments codify for proteins or RNA 
molecules: an overwhelming majority of human DNA (98%) contains non-coding regions (introns) that do not 
represent any particular functional product |5J, although it is believed that they help to protect the genes. 

Every cell of multicellular- organisms has the entire set of information needed, but only some genes ai^e 
expressed depending on the function of the cell. For example, cells in our retina need photosensitive molecules, 
whereas our liver do not. A gene is expressed in a cell when the protein or RNA it codes for is synthesized. 
The large majority of abundantly expressed genes are associated with common functions, such as metabolism, 
and hence are expressed in all cells. However, there will be differences in the expression profiles of different 
cells, and even in a single cell, expression will vary with time, in a manner dictated by external and internal 
signals that reflect the state of the organism and the cell itself 13. 

Although DNA molecule contains all the instructions to make a huge amount of diverse proteins, DNA is not 
able to come out of the eucaryotic cell nucleus. Therefore, when certain protein is needed, another molecule 
called messenger RNA (m-RNA) is formed from DNA in a fundamental process called transcription, and is 
this molecule who travels outside the nucleus canying the information. RNA is also a nucleic acid but is 
formed by a single of nucleotides, and its sugar (ribose) is slightly different from deoxyribose. Besides, RNA 
has uracil U instead of the base thymine T. In cells, one can find three important types of RNA: messenger 



RNA (m-RNA), which ttansports the instructions to make a protein from the nucleus to the ribosomes, transfer 
RNA (t-RNA) which carries the amino acids to the ribosomes where the proteins are assembled and is found 
in the cytoplasm, and, finally, the r-RNA, or ribosomal RNA, which is one of the substances from what the 
ribosomes are made out |6l|. 

In the transcription process, the portion of DNA containing the sequence for the needed product splits off, and 
then the free RNA nucleotides existing in the nucleus are attached to the exposed DNA nucleotides forming 
a complementary chain of RNA. This m-RNA chain comes out of the nucleus carrying a complementary 
sequence and arrives finally to the ribosome, where the sequence is translated into a protein(translation 
process). A cell may need a large number of some proteins and a small number of others, i.e. every gene 
may be expressed at a different level. The manner in which the instructions to start and stop ti'anscription are 
given for a certain gene is governed by regulatory networks. Transcription is regulated by special proteins, 
called transcription factors, which bind to specific locations on the DNA, upstream from the coding region. 
Their presence at the right site initiates or suppresses transcription. The basic paradigm of gene expression 
analysis states that the biological state of a cell is reflected by its expression profile: the expression levels of 
all the genes of the genome. These in turn are reflected by the concentrations of the corresponding m-RNA 
molecules 13. 

Several genomes of diverse organisms have been completed in the last years, the human genome, published on 
2002, among them. Now, the main focus is to understand the underlying function and mechanisms behind these 
genomes. Some of the questions that remain unanswered included what are the functional roles of different 
genes, how genes are regulated, how do genes and gene products interact, how does gene expression level 
differ in various cell types and states, and how is gene expression changed by various diseases or compound 
treatments Q. 

2.2 DNA Microarrays 

Although m-RNA is not the ultimate product of a gene, transcription is the first step in gene regulation, 
and information about the transcript levels is needed as a first approach for understanding gene regulatory 
networks. DNA microarrays or DNA chips are one of the latest breakthroughs in experimental molecular 
biology precisely because they allow to monitor the expression of thousands of genes at the same time. 
The potential this technology is tremendous: monitoring gene expression levels in different developmental 
stages, tissue types, clinical conditions and different organisms can help understanding gene function and gene 
networks, assist in the diagnosis of disease conditions and reveal effects of medical treatments. 

There ai^e currently two main technologies that generate large-scale gene expression data: cDNA and 
oligonucleotide microarrays. CDNA microarrays contain large sets of complementary DNA sequences several 
hundred bases long, each set representing a gene, immobilized on a solid substrate. In oligonucleotide 
microarrays, each gene is represented on the array by a set of 15-20 different oligonucleotide probes designed 
to hybridize perfectly to some particular sequence, and some mismatch control oligonucleotides, identical to 
the perfect match probes except for a single base-pair mismatch. These mismatch control oligonucleotides 
allow estimation of cross-hybridization, improving reproducibility and accuracy of RNA quantification, and 
reducing the rate of false-positives. In general, oligonucleotides used consist about 20-25 nucleotides, and are 
synthesized in situ with photolithography techniques |8l|, ||9l. In brief, functioning of microarrays is based on 
the preferential binding of complementary single stranded nucleic acid sequences, and a single microarray may 
contain tens of thousands of spots. One of the most popular experiments involving cDNA microarrays consists 
in compare m-RNA abundance in two samples. The m-RNA molecules are extracted from cells taken from two 
tissues of interest (e.g. tumour and normal tissues) ifTOll . They are reverse transcribed from RNA to DNA and 
their concentration is enhanced. The resulting DNA is transcribed back into fluorescently marked single str^and 
RNA. For example, tumour tissue is labeled with a red dye and normal tissue with a green one. The solution 
of marked and enhanced m-RNA molecules ("targets") that are copies of the m-RNA molecules originally 
extracted from the tissue, is placed on the chip and diffuses over the collection of single strand DNA probes. 



When an m-RNA encounters a pait of the gene of which it is a perfect copy, it hybridizes to it with a high 
affinity (considerably higher than with a bit of DNA of which it is not a perfect copy) and when the m-RNA 
solution is washed off, only those molecules that found their perfect match remain stuck to the chip. Next, the 
chip is illuminated with a laser, and the stuck targets fluoresce. If RNA from tumour tissue is in abundance, 
the spot will emit red light, but if instead RNA from normal tissue is in abundance, it will appear green. If 
tumour and normal RNA bind equally, the spot will be yellow, while if neither binds, it will not fluoresce 
and appear black Q. Therefore, by measuring the light intensity emanating from each spot, one obtains a 
measure of the number of targets that stuck, which, in turn, is proportional to the concentration of these m- 
RNA in the investigated tissues. CDNA microaiTays are a differential technique because only ratios between 
both fluorescence wavelengths give meaningful information and hence, only relative expressions levels are 
obtained. On the other hand, with oligonucleotide arrays, absolute expression levels are measured lITTl . 
The characterization of genes expressed differently in normal and their corresponding tumour cells has been 
particularly important 1 12|. Arrays have also being used to discover transcribed regions in genomic DNA ITbII : 
to detect polymorphism in copy number of regions of the genome ll4l . which may be a new and important class 
of mutation; and to analyze amplifications and deletions that are associated with oncogenic transformation and 
some inherited conditions fTSll . fT6l . A number of diagnostic applications for arrays have been suggested. The 
first to be granted approval by the US Food and Drug Administration is the Roche AmpliChip for cytochrome 
P450. This test will help doctors determine an individual's genotype to determine appropriate drugs and doses 
to prescribe, minimizing harmful drug reactions [171. 



2.3 Gene Clustering 

Microarray data analysis can be divided into two general classes: supervised and unsupervised analysis. The 
supervised approach assumes that for some (or all) profiles we have additional information, such as functional 
classes for the genes, or diseased/normal states attributed to the samples. We can viewed this additional 
information as labels attached to the rows or columns. Having this information, a typical task is to build a 
classifier able to predict the labels from the expression profile. If a clasiffier that is able to distinguish between 
two different, but morphologically closely related tumour tissues, can be constructed, such a classifier can be 
used for diagnosis. Classifiers are trained on a subset of data with a priori given classification and tested on 
another subset with known classification. After assessing the quality of the prediction they can be applied to 
data with unknown classification fj). Unsupervised data analysis consists on clustering expression profiles to 
find groups of co-regulated genes or related samples. An example of these two kind of clustering combined to 
predict clinical outcome of breast cancer can be seen at 1181 . 

Some short DNA sequences in or around the gene, specifically in the promoter region, serve as switches that 
control gene expression. Special proteins (transcription factors) interact with these binding sites, and represses 
or activates the transcription process of the gene |11|. Various genes that share common functions or the same 
regulatory mechanism at the sequence level, can have the same binding site sequence. As a result, similar 
expression patterns can correspond to similar binding site patterns. Then, a key step in the analysis of gene 
expression data is the detection of groups of genes that exhibit similar behaviour of expression patterns (i.e. 
are coexpressed), based on the idea that these genes share common regulatory or functional roles, assumption 
that has proved right in many experiments (see for example (T9i ). 

The challenge is then transformed into the problem of clustering genes into groups based on their similarity in 
expression profiles. Instead of clustering genes, experimental conditions can also be clustered, the task being 
now to find groups of experimental conditions (which can be, for example, tumour samples) across which all 
the genes behave similai^ly. This type of clustering can be helpful for diagnosis. In gene expression, elements 
to be clustered ai^e usually genes and the vector of each gene is its expression pattern; similarity between 
genes can be measured in various ways that ai^e problem dependent, for example by the correlation coefficient 
between vectors. The goal is to partition the elements into subsets, i.e. clusters, so that elements in the same 
cluster are highly similar to each other and elements from different clusters have low similarity to each other. 



An essential step to obtain an effective cluster analysis is the preprocessing of the initial data. Although this 
work does not attempt to give a detailed explanation of the various preprocessing steps, the most common 
ones are mentioned in the following. The first step is the normalization of the hybridization intensities 
within a single experiment or across experiments. Besides, expression ratios are not symmetrical in the 
sense that upregulated genes have expression ratios between one and infinity, while downregulated genes 
have expression ratios squashed between one and zero. Taking the logarithms of these expression ratios results 
in symmetry between expression values of up- and downregulated genes. This preprocessing step is called 
nonlinear transformation of the data. Other, but uncommonly used transformations, include square, square 
root, and inverse transformations. Missing value replacement is another step, and is made only in the case 
of using a clustering algorithm that are not able to handle missing values due to technical reasons in the 
data. Some genes do not really contribute to the biological process because their expression values show 
little variation over the different experiments, and another problem are expression profiles with a considerable 
number of missing values. This non desirable data is removed in the filtering process. Biologists are mainly 
interested in grouping gene expression profiles that have the same relative behaviour, i.e. genes that are up- 
and downregulated together. Genes showing the same relative behaviour but with diverging absolute behaviour 
will have a relatively high Euclidean distance, and cluster algorithms based on this distance measure will 
therefore wrongfully assign the genes to different clusters. This can be prevented by applying standardization 
or rescaling to the gene expression profiles so that they have zero mean and unit standard deviation. ( flDl '). 

In a typical experiment to monitor gene expression levels several DNA chips aie used, and since each DNA chip 
contains thousands of spots, a huge amount of information is obtained. These results are summarized in a G x 5 
expression table, in which G represents the number of genes placed on every chip and S is the number of DNA 
chips used (each chip accounting for different conditions, experiments, time points or samples). Therefore, 
each row on this matrix corresponds to one particular gene and each column to a different sample. Each 
element Egs of the matrix represents the expression level of gene g in sample or condition s. Each column 
is called the profile of that condition, and each row vector is the expression pattern of a gene across all the 
conditions, commonly named expression profile. If the input data for a clustering problem is given in this 
form, it said to be as fingerprint data. Other type of input data is similarity data, where pairwise similarity 
values between elements ai^e used. These values can be computed from fingerprint data. Alternatively, the 
data can represent pairwise dissimilarity. Fingerprints contain more information than similarity values, but the 
latter can be used to represent the input to clustering in any application. Moreover, the fingerprint matrix is 
of order G x S while the similarity matrix is of order G x G. It is important to note that in gene expression 
applications typically G » S, while in tissue classification applications often G << S l20ll . 

We need a way to measure the similarity (or distance) between the genes or samples being compared and 
clustered. We can regard these rows or columns in the matrix as points in ?i-dimensional space or as n- 
dimensional vectors, where n is the number of samples for gene comparison, or number of genes for sample 
comparison. The natural, so called Euclidean distance between these points in the n-dimensional space may 
be the most obvious, but not necessarily the best choice. There is no theory how to choose the best distance 
measure. Possibly one right distance measure in the expression profile space does not exist, and the choice 
should depend on the problem studied [7 |. Some distance metrics commonly applied are the following: 

1. Pearson correlation. The Pearson correlation r is the dot product of two normalized vectors (i.e. the 
cosine between two vectors). It measures the similarity in the shapes of two profiles, while not taking the 
magnitude of the profiles into account, and therefore suits well the biological intuition of coexpression. 

2. Squared Pearson correlation. This is the square of the Pearson correlation, which considers two vectors 
pointing in the exact opposite directions to be perfectly similar, which might also be interesting for 
biologists (because repression is a form of coexpression). 

3. Euclidean distance. Euclidean distance measures the length of the straight line connecting the two 
points. It measures the similarity between the absolute behaviours of genes, while the biologists are more 
interested in their relative behaviours. Thus, a standardization procedure is needed before clustering 



using Euclidean distance. Importantly, after standardization, the Euclidean distance between two points 
X andy is related to the Pearson correlation by |jc — = 2(1 — \r\). 

4. Jackknife correlation. The jackknife correlation is an improvement for the Pearson correlation (which 
is not robust to outliers). Jackknife coiTclation increases the robustness to single outliers by computing 
a collection of all the possible leave-one-(experiment-)out Pearson correlations between two genes and 
then selecting the minimum of the collection as the final measure for the correlation. 

The first generation of cluster algorithms used for gene expression profiles were developed for purposes did 
not related with biological research (e.g. hieraixhical clustering, K-means and self organizing maps(SOM)). 
Although it is possible to obtain biologically meaningful results with these algorithms, some of their 
characteristics often complicate their use for clustering expression data. More recently, new algorithms have 
been developed specifically for gene expression profile clustering to overcome some of the limitations of earlier 
methods. These algorithms include, among others, model-based algorithms, the self-organizing tree algorithm 
(SOMA), quality-based algorithms, and biclustering algorithms. Also, some procedures have been developed 
to help biologists estimate some of the parameters needed for the first generation of algorithms, such as the 
number of clusters present in the data. While it is impossible to give an exhaustive description of all clustering 
algorithms developed for gene expression data, we try here to illustrate some of them. 

2.3.1 Hierarchical Clustering 

Agglomerative or hierarchical clustering algorithms ( Eli ) are among the oldest and most widely used 
clustering methods applied to gene expression data. Typically, the algorithm takes each expression profile 
as one cluster at the beginning. Then computes the distance between every pair of clusters, and the pair of 
clusters with the minimum distance is merged; the procedure is carried on iteratively until all elements ends 
into one single cluster. The whole clustering process is presented as a tree called a dendrogram and the original 
data are often reorganized in a heat map demonstrating the relationships between genes or conditions. After 
the full tree is obtained, the determination of the final clusters is achieved by cutting the tree at a certain level or 
height, which is equivalent to putting a threshold on the pairwise distance between clusters. The decision of the 
final clusters is arbitrary, because it is difficult to predict which level will give the best biological results. Note 
that the memory complexity of hierarchical clustering is quadratic in the number of gene expression profiles, 
which can be a problem due to the large number of genes involved in experiments. 

As in every step of agglomerative clustering, the two subsets that are closest or more similar to each other are 
merged, the distance between two clusters has to be defined. There are four common options: 

1 . Single linkage. The distance between two clusters is taken as the distance between the two closest data 
points, each point belonging to a different cluster. 

2. Complete linkage. The distance between the two furthest data points, each one in a different cluster. 

3. Average linkage. Both single linkage and complete linkage are sensitive to outliers. Average linkage 
provides an improvement by defining the distance between two clusters as the average of the distances 
between all pairs of points in the two clusters. 

4. Ward's method. At each step of agglomerative clustering, instead of merging the two clusters that 
minimize the pairwise distance between clusters, Wai^d's method (lEU) merges the two clusters that 
minimize the "information loss" for the step. The "information loss" is measured by the change in the 
sum of squared error of the clusters before and after the merge. In this way. Ward's method assesses the 
quality of the merged cluster at each step of the agglomerative procedure. 

These methods yield similar results if the data consist of compact and well separated clusters. However, if 
some of the clusters are close to each other or if the data have a dispersed nature, the results can be quite 



different. Ward's method, although less well known, often produces the most satisfactory results E^ . 

Eisen et al. developed a clustering software package based on average linkage hierarchical clustering 1191 . The 
clustering program is called Cluster, and the accompanying visualization program is called Tree View. Both 
programs are available at http://rana.lbl.gov/EisenSoftware.htm. 

2.3.2 K-Means Clustering 

K-means clustering (l24l. ll25l^ is a simple and popular partitioning method for data analysis. The number 
of clusters K in the data is needed as an input for the algorithm. K-means starts by assigning at random all 
gene expression profiles to one of the K clusters. Iteratively, the center, which is nothing more than the average 
expression vector of each cluster, is calculated and then the gene expression vectors are reassigned to the cluster 
with the closest cluster center. The initial mean vectors are called the seeds. The iterative procedure converges 
when all the mean vectors of the clusters remain stationary or the given number of iterations is exceeded. Since 
it is difficult to predict the number of clusters in advance, the predefinition of the number of clusters by the user 
is arbitrary. In practice, this implies the use of a trial-and-error approach where a comparison and biological 
validations of several runs of the algorithm with different parameter settings are necessary r ilTTl '). Another 
parameter that influence the result of K-means clustering is the choice of the seeds. The algorithm suffers from 
the problem that with different seeds the algorithm can yield very different result. 

2.3.3 Self- Organizing Maps 

SOM (||26l) is a technique developed by Kohonen for fitting a number of reference vectors to the distribution 
of gene data, by means of a set of nodes. The nodes are the intersections of a two-dimensional grid (usually of 
hexagonal or rectangular geometry). In the high-dimensional input space (with the gene expression vectors), 
each node represents a reference vector (similar to the mean vectors in the K-means algorithm). The dimension 
of the grid (e.g. lattice of 6x5 nodes) needs to be specified a priori. The initial position of the reference 
vectors is randomly chosen, and then the algorithm selects a random data vector p, identifies the node Up 
whose reference vector is closest to p, and updates the position of all reference vectors towards p according 
to a predefined learning function. The amount of position adjustment determined by the learning function 
decreases as the distance between n and Hp (in the grid) and the iteration number grow. The intuition for 
this learning process is that the reference vectors that are close enough to p will be pulled towards it, and the 
stiffness of the grid structure will propagate some of impact to neighbouring nodes. As a result, a reference 
vector is pulled more towards input vectors that ai^e closer to the reference vector itself and is less influenced 
by the input vectors located further away. In the meantime, this adaptation procedure of reference vectors is 
reflected on the output nodes (nodes associated with similar reference vectors are pulled closer together on 
the output grid). The algorithm terminates when convergence of the reference vectors is achieved or after 
completing a pre-defined number of training iterations. 

Because of the advantage in visualization, choosing the geometry of the output grid is not as crucial a problem 
as the choice of the number of clusters for a K-means method. Like the K-means method, the initial choice of 
reference vectors remains a problem that influences the final clustering result of SOM clustering. A good way 
to seed the reference vectors is to use the result from a principal component analysis(PCA) ll23l . 

Tamayo et al. fTT\ devised a gene expression clustering software, GeneCluster, which uses the SOM algorithm. 
The software is available at 

http://www.broad.mit.edu/cancer/software/genecluster2/gc2.html. 

2.3.4 Self-Organizing Tree Algorithm 

SOTA combines SOM and hierarchical clustering techniques. As in SOM, SOTA maps the input gene profiles 
to an output space of nodes. However, the nodes in SOTA, instead of being in a two-dimensional grid, are in 
the topology (or geometry) of a binary tree. The number of nodes in SOTA is not fixed from the beginning (in 



contrast to SOM) because the tree structure of the nodes grows during the clustering procedure. 

The initial system is composed of two external elements, denoted as cells, connected by an internal element 
that its called node, like a tree with two leaves. Each cell (or node) is a reference vector with the same size 
as the gene profiles. In the beginning, the entries of the two cells and the node are randomly initialized. 
The series of operations performed until a cell generates two descendants is called a cycle. During a cycle, 
cells and nodes are repeatedly adapted by the input gene profiles. Adaptation in each cycle consists on the 
presentation of all expression profiles to the network, and this implies two steps: first, each gene profile is 
associated with the cell whose reference vector is located closest to it, and second, the reference vector of 
this cell and its neighbouring nodes, including its parent node and its sister cell, are updated based on some 
neighbourhood weighting pai^ameters (which perform the same role as the neighbourhood function in SOM). 
Thus, a cell is moved into the direction of the expression profiles that are associated with it. The network 
follows its growing process by replicating the cell whose associated profiles exhibits the highest heterogeneity, 
i.e., the largest variability (defined by the maximal distance between two profiles that are associated with the 
same cell). This cell gives rise to two new descendants cells and become a node. The values of the two new 
cells aie identical to the node that generate them and the whole procedure starts again. The growing process 
ends when the heterogeneity of the system falls below a threshold. This threshold can be set to zero for a 
fully resolved dendogram similar to that provided by hierarchical clustering. If the threshold is obtained from 
the randomized distribution of data, SOTA will provide the cluster hierarchy that minimizes the probability of 
having missassigned genes to them [28J . 

SOTA has two crucial advantages: the topology is that of a hierarchical tree, and the clustering obtained is 
proportional to the heterogeneity of the data, instead of the number of items (this is due to the fact that SOTA 
is distribution preserving while SOM is topology preserving). In both SOM and SOTA, the training process 
changes the vectors in the nodes to weighted averages of the gene expression patterns associated to them. The 
advantage in the case of SOTA is that the binary topology produces a nested structure in which nodes at each 
level are averages of items below them (items that can be nodes or in the case of terminal nodes, genes). This 
makes it straightforward to compare average patterns of gene expression at different hierarchical levels even 
for lai^ge data sets I28II . 

2.3.5 Model Based Clustering 

Model Based Clustering assumes that the data are generated by a finite mixture of underlying probability 
distributions such as multivariate normal distributions. In this case, each cluster C, is represented by a 
multivariate Gaussian model pi in d dimensions: 

= (2-1) 

where yj is an expression profile and /x, and the mean and covariance matrix of the multivariate normal 
distribution respectively fllll . 

The covariance matrix Y.i can be represented by its eigenvalues decomposition, which in general takes the 
following structure: 

£ = A,AA,-Df, (2.2) 

where D, is the orthogonal matrix of the eigenvectors of A; is a diagonal matrix whose elements are 
proportional to the eigenvalues of , and A; is the constant of proportionality. This decomposition implies a 
nice geometric interpretation of the clusters: D; controls the orientation. A, controls the shape, and A; conti'ols 
the volume of the cluster. Note that simpler forms of the covariance structure can be used (e.g., by having some 
of the parameters take the same values across clusters), thereby decreasing the number of parameters that have 
to be estimated but also decreasing the model flexibility (capacity to model more complex data structures). 



The mixture model p itself takes then the following form: 



K 

p{yj) = Y.^iPi{yj\^^i.Y.)^ (2.3) 
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where K is the number of clusters and 71, is the prior probability that an expression profile belongs to cluster Q 
so that: 

K 

E^' = l- (2-4) 
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In practice we would like, given a collection of expression profiles yj{j =\,... ,n), to estimate all the parameters 
{7ti,lXi,Y,i{i = i,---,K), and K itself) of this mixture model. In a first step 7li,lXi,Y,i{i = i,---,K) are estimated 
with an EM algorithm using a fixed value for K and a fixed covariance structure flU . In the EM algorithm, 
the Expectation steps and Maximization steps alternate. In the E step, the probability of each observation 
belonging to each cluster is estimated conditionally on the cun^ent parameter estimates. In the M step the 
model parameters are estimated given the cuiTcnt group membership probabilities. When the EM algorithm 
converges, each observation is assigned to the group with the maximum conditional probability E^ . This 
parameter estimation is then repeated for different values for K and different covariance structures. The 
result is thus a collection of different models fitted to the data and all having a specific value for K and a 
specific covariance structure. In a second step the best model in this group of models is selected (i.e., the 
most appropriate number of parameters and a covariance structure is chosen here). This model selection step 
involves the calculation of the Bayesian Information Criterion (BIC) for each model l30l . which is not further 
discussed here. 

A good implementation for model based clustering (called MCLUST) is available at 

www.stat.washington.edu/fraley/mclust. Yeung et al. reported good results using this software on several 
synthetic data sets and real expression data sets. McLachlan et al. ISDl have also implemented model- 
based clustering in a Fortran program called EMMIX, which is also freely available from the web at 
http ://w w w. maths .uq. edu . au/gjm/emmix/EMMIX.f . 

2.3.6 Quality-Based Algorithms 

Quality-based algorithms produces clusters with a quality guarantee that ensures that all members of a cluster 
ai"e coexpressed (this property is called transitivity). This concept was introduced by Heyer, fouglyak and 
Yooseph, ( 1321 ') and their implementation is called QT _ Clust. The quality of a cluster C is defined as a 
diameter (equal to 1 —minij G Sjj, where Sij is the jackknife correlation between expression profiles / and j), 
but the method can be easily extended to other definitions. 

The algorithm considers every expression profile in the data set as a cluster seed (one could also call this a 
cluster center) and iteratively assigns the expression profiles to these clusters that cause a minimal increase in 
diameter until the diameter threshold, i.e., quality guarantee, is reached. Note that at this stage every expression 
profile is made available to every candidate cluster and that there are as many candidate clusters as there are 
expression profiles. At this point, the candidate cluster that contains the most expression profiles is selected as 
a valid cluster and removed from the data set where after the whole process starts again. The algorithm stops 
when the number of points in the largest remaining cluster falls below a threshold. Note that this stop criterion 
implies that the algorithm will terminate before all expression profiles are assigned to a cluster. 

This approach has some advantages, for example it is possible to find clusters containing highly coexpressed 
genes, and these clusters might therefore be good seeds for further analysis. Moreover, genes not really 
coexpressed with other members of the data set aie not included in any of the clusters. Some disadvantages 
are that the quality guarantee of the clusters is a user defined pai^ameter hard to estimate, it is hai^d to use by 
biologists, needs extensive parameter fine-tuning, and produces clusters all having the same fixed diameter not 
optimally adapted to the local data structure ITTIl . 



2.3.7 Adaptive Quality-Based Clustering 

Adaptive quality-based clustering (||33l) consist of a two-step approach. In the first step, a quality-based 
approach is performed to locate a cluster center in an area where the density of gene expression profiles is 
locally maximal using a preliminary estimate of the radius (i.e. the quality) of the cluster. In the second 
step, called adaptive step, the algorithm re-estimates the radius of the cluster so that the genes belonging to it 
are, in a statistical sense, significantly coexpressed. To this end, a bimodal and one-dimensional probability 
distribution (the distribution consists of two terms: one for the cluster and one for the rest of the data) describing 
the Euchdean distance between the data points and the cluster center is fitted to the data using an expectation- 
maximization (EM) algorithm. 

Finally, step one and two are repeated, using the re-estimation of the quality as the initial estimate needed in 
the first step, until the relative difference between the initial and re-estimated quality is sufficiently small. The 
cluster is subsequently removed from the data and the whole procedure is restarted. Note that only clusters 
whose size exceeds a predefined number are presented to the user. 

In adaptive quality-based clustering, users have to specify a threshold for quality control. This parameter has 
a strict statistical meaning and is therefore much less arbitrary (in contrast to the case in QT_Clust). It can be 
chosen independently of a specific data set or cluster and it allows for a meaningful default value (95%) that 
in general gives good results. This makes the approach user friendly without the need for extensive parameter 
fine-tuning. Furthermore, with the ability to allow the clusters to have different radius, adaptive quality-based 
clustering produces clusters adapted to the local data structure llll . An application of Adaptive Quality- Based 
Clustering to nervous system is found in 

However, the method has some limitations like it does not converge in every situation. A server running the 
program is available at http://homes.esat.kuleuven.be/thijsAVork/Clustering.html 

2.3.8 Biclustering and Some Physics Related Algorithms 

Clustering can be applied to either the rows or the columns of the data matrix, sepai^ately. Biclustering, on the 
other hand, performs clustering in these two dimensions simultaneously. This means that clustering derives a 
global model while biclustering produces a local model 135 1 . The term biclustering was first used by Cheng and 
Church [36l in gene expression data analysis. It refers to a distinct class of clustering algorithms that perform 
simultaneous row-column clustering. One of the eaiiiest biclustering formulations is the direct clustering 
algorithm introduced by Hartigan E^ . also known as block clustering. 

The goal of biclustering techniques is thus to identify subgroups of genes and subgroups of conditions, 
by performing simultaneous clustering of both rows and columns of the gene expression matrix, instead 
of clustering these two dimensions separately. We can then conclude that, unlike clustering algorithms, 
biclustering algorithms identify groups of genes that show similar activity patterns under a specific subset 
of experimental conditions ll^l . 

There are also several physics related clustering algorithms, e.g. Deterministic Annealing l37l and Coupled 
Mass ll38l . Deterministic Annealing uses the same cost function as K-means, but rather than minimizing it 
for a fixed value of clusters K, it performs a statistical mechanics type analysis, using a maximum entropy 
principle as its starting point. The resulting free energy is a complex function of the number of centroids and 
their locations, which are calculated by a minimization process. This minimization is done by lowering the 
temperature variable slowly and following minima that move and every now and then split (corresponding to 
a second order phase transition). Since it has been proven [39] that in the generic case the free energy function 
exhibits first order phase transitions, the deterministic annealing procedure is likely to follow one of its local 
minima ||5l. 

Finally, it is important to stress that clustering methods have been used in a large variety of scientific disciplines 
and applications that include pattern recognition |'40'|, learning theory ll4Tl . astrophysics ll42l . medical images 
and data processing [43], machine translation of text [44|, satellite data analysis B31 . as well as speech 



recognition 



2.4 Superparamagnetic Gene Clustering: Monte Carlo Simulations 

This method takes the data points generated by gene expression profiles as sites of an inhomogeneous Potts 
ferromagnet, and was first proposed by Eytan Domany et al. iHT^I. The presence of clusters in the data gives 
rise to magnetic grains, and working in the superparamagnetic phase, the SPC algorithm decides if a data 
point belong to the same grain using the pair conelation function of the Potts spins. Additionally, temperature 
controls the level of resolution obtained. 

A Potts system is said to be homogeneous when its spins are on a lattice and all nearest neighbour couplings 
are equal, Jij = /. This system exhibits two phases, at high temperatures is paramagnetic or disordered, and 
at low temperatures is ordered. In the disordered phase the con^elation function G,y decays to 1/^ when the 
distance between points v, and vj is large (remember from last chapter, that q is the number of possible states 
in the Potts model). This is the probability to find two completely independent Potts spins in the same state. 
At very high temperatures even neighbouring sites have G,y « 1 /^. As the temperature is lowered, the system 
undergoes a sharp transition to an ordered, feiTomagnetic phase, meaning that one Potts state dominates the 
system. At very low temperatures G;y 1 for all pairs v,-, vy, i.e. all spins have the same q l48l . 

In strongly inhomogeneous Potts models, spins form magnetic grains with very strong couplings between 
neighbours that belong to the same grain, and very weak interactions between all other pairs. At low 
temperatures such a system is also ferromagnetic, but as the temperature is raised the system may exhibit 
an intermediate, super-paramagnetic phase. In this phase strongly coupled grains aix aligned (i.e. aie in their 
respective ferromagnetic phases), while there is no relative ordering of different grains. This is illustrated in 



Figure 2.1: At high T all sites have different spin values, but as T is lowered, regions of aligned spins appears 
(superpai^amagnetic phase). At low T, the system is completely ordered. 

At the transition temperature from the ferromagnetic to super-paramagnetic phase a pronounced peak of % 
is observed l47l . As the temperature is further raised, the super-paramagnetic to paramagnetic transition is 
reached; each grain disorders and % abruptly diminishes by a factor that is roughly the size of the largest 
cluster. Thus the temperatures where a peak of the susceptibility occurs and the temperatures at which % 
decreases abruptly indicate the range of temperatures in which the system is in its super-paramagnetic phase. 
In principle, one can have a sequence of several transitions in the super-paramagnetic phase: as the temperature 
is raised the system may break first into two clusters, each of them in turn breaks into more (macroscopic) sub- 
clusters and so on. Such a hierarchical structure of the magnetic clusters reflects a hierarchical organization of 
the data into categories and sub-categories l49l . 

In concreteness, SPC method consists on three stages. First, to specify the Hamiltonian which governs the 
system. Second, find the temperature range where the superparamagnetic phase take place, taking into acount 
the susceptibility behaviour. Finally, the correlation of neighbouring pairs of spins, G,y is measured and, taking 



Fig.im 




into account these values, the clusters ai^e formed. 



2.4.1 Detailed Description of SPC 

Each expression profile is represented as a point in a D dimensional space, and a random spin value a,, 
/ = 1,2, is assigned to it. A small value q hinders the identification of the SPM clusters since different 
clusters are then forced to point into the same Potts direction. Too large q makes the calculations more 
cumbersome. However, the results depend only weakly on the value of q. In the next step, the neighbours 
of each spin v, are calculated using the K mutual neighbour criterion. This criterion initially calculates the K 
nearest points of each site. If v, has vy among its K nearest points, and vy, in turn, has v, as one of its K nearest 
points, then v/ and vj are considered as neighbours. 

The average number of neighbours K and the average of all distances a between neighbouring pairs v,- and vj 
are then computed, and finally the interaction couplings which will appear in the Hamiltonian will be calculated 
as follows: 



1 -J 

■^e ^ if v; and V ; are neighbours 

^ (2.5) 
otherwise 



Choosing Jij in this way creates strong interactions between spins associated with the data from high density 
regions, and weak interactions between neighbours that are in low density regions l50ll . 

Any different assignment of spins to data points S has an energy cost given by: 

H{S)=Y,Jij5o..o,, (2.6) 

where the sum is over neighbouring sites. The function Sa-^a- is the Kroenecker symbol taking the value 1 
when a, = Oj and otherwise. The lowest possible energy cost, H{S) = is attained when we assign the same 
spin to all points, which corresponds to all data points being assigned to the same cluster. Moreover, as one 
chooses interactions that are a decreasing function of the distance dij, then the closer two points are to each 
other, the more likely is for them to be in the same state. In summary, this Hamiltonian procedure penalizes 
placing spins at points i,j in different clusters, and this penalty decreases with the distance between the points 

ill- 

The next step is the calculation of magnetization, susceptibility and correlation function for pairs of neighbours 
Gij over a range of temperatures using Monte Carlo technique. The original creators of SPC used the S wendsen 
Wang algorithm. 

As the temperature increases, M varies from 1 to via sharp phase transitions. At low temperatures the system 
is fully magnetized and the fluctuations in m are negligible. As T increases to the point where the single cluster 
breaks into subclusters (or become completely disordered), fluctuations become very large. Hence, one expect 
to identify the transitions at which clusters break up by the sharp peaks of the susceptibility ISTTl . 

The strategy is to vary T and measure xi^). Transitions show up as peaks of At temperatures between 
transitions, we expect to observe relatively stable phases that correspond to some clusters being ordered 
internally and uncorrelated with other clusters. Within each such phase, Gij is measured. The value of Gij 
is the probability to find the two Potts spins a, and Oj in the same state, i.e. the probability to find them in the 
same cluster. By the relation to granular ferromagnets we expect that the distribution of Gij is bimodal; if both 
spins belong to the same ordered grain (cluster), their coiTclation is close to 1; if they belong to two clusters 
that are not relatively ordered, the conelation is close to 0. Rather than thresholding the distances between 
pairs of points to decide their assignment to clusters, we use the pair correlations, which reflect a collective 
aspect of the data's distribution [49|. 



Clusters are identified in tiiree steps: 



1. Build the cores of the clusters using a thresholding procedure. If Gij > 0.5, a link is set between the 
neighbour data points v,- and vj. The resulting connected graph depends weakly in the value used in this 
thresholding, as long as it is bigger than l/q and less than l—2/q l49l . The reason is that the distribution 
of the correlations between two neighbouring spins peaks strongly at these two values and is very small 
between them. 

2. Capture points lying in the periphery by linking each point to its neighbour of maximal correlation. Of 
course, some points were already linked in step one. 

3. Data clusters are identified as the linked components of the graph obtained in the previous steps. 
The temperature controls the resolution at which the data ai^e clustered. 

It is intuitively clear that if a set of data points form a dense cloud, isolated from the rest of the data, 
the corresponding spins will form a ferromagnetic domain at some low temperature, which will become 
paramagnetic and lose its correlations only at a high temperature. Hence the size of the temperature interval 
dT in which such a fen^omagnetic domain exists can be used as a measure of the stability and significance of 
the corresponding data cluster. 

Some of the demonstrated useful properties of SPC are the following: (a) the number of clusters is determined 
by the algorithm itself and not externally prescribed (as is done by SOM and K-means); (b) presents stability 
against noise; (c) generates a hierarchy (dendrogram) and provides a mechanism to identify in it robust, stable 
clusters (by the value of dT ); (d) ability to identify a dense set of points forming a cloud of an iiTcgular (non- 
spherical shape) as a cluster 

The SPC method has been used in various contexts, like computer vision f52\, speech recognition f49\ and 
identification of clusters of companies in stock indices ll53l . Its first direct application to gene expression data 
has been for analysis of the temporal dependence of the expression levels in a synchronized yeast culture [541. 
identifying gene clusters whose variation reflects the cell cycle. Subsequently, the SPC was used to identify 
primary targets of p53 PS j, the most important tumour suppressor that acts as a transcription factor of central 
importance in human cancer. SPC has been used also to cluster protein sequences 1561 . and to classify or 
identify new genes associated with colon and skin cancer BTI . 

2.4.2 Future Directions 

The location of the superparamagnetic phase in the SPC algorithm is closely related to the phase transitions 
occurring in the system. The introduction of the Wolff algorithm instead of the originally used Swendsen- 
Wang algorithm will probably improve the efficiency of the method, and this is left for future investigations, 
as well as a comparison of different methods with the SPC algorithm. 
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3.1 History of Gompertz Equation 



In 1724 Moivre presented his hypothesis of uniform decrement, summarized in the expression y(x) = K{w — x), 
where y{x) represents the surviving persons with age x, K is the slope or velocity with which the population 
diminishes in the mortality table, and w is the maximum survival age for the population. The Moivre straight 
line was recommended for an age range between 12 and 86, in which it adjusted better. This linear hypothesis 
was exceeded by Benjamin Gompertz, who beUeved in the existence of two general causes of mortality: chance 
and the increasing inability of men to avoid death. Gompertz took into account only the biological causes, and 
his hypothesis was based on the following idea: "Men resistance to death diminish with time in a proportional 
rate" UJ. 

Benjamin Gompertz was a British mathematician interested, besides other subjects as astronomy, in the 
problem of life insurances and mortality rates in the nineteenth century. He worked with death and population 
records of people in England, Sweden and France between ages 20 and 60 and noted that the arithmetic 
increases in age were consistently accompanied by geometric increases in mortality, and that this law of 
geometrical progression appeared in large portions of the different tables of mortality. Nowadays, the simple 
formula describing the exponential rise in death rates between sexual maturity and extreme old age, Y{t) = 
is better known as Gompertz equation. In his first paper about this subject published in 1820 f51, Gompertz 
identified this peculiar pattern among different european populations for a limited portion of the age range. For 
his second paper, Gompertz used equal intervals of longer periods of time than in his previous work and found, 
for example, that the differences in the natural logarithm between successive 10 year age intervals between 
ages 15 and 55 in a mortality table for Deparceaux, France, were all nearly identical. Gompertz believed 
he had discovered a general law of mortality after observing similar patterns of geometrical progression in 
other tables of mortality, and published it in 1825 in the Philosophical Transactions of the Royal Society, in 
a paper whose title was "On the Nature of the Function Expressive of the Law of Human Mortality" Q- In 
his third paper he improved his original notation and finally presented the last one in 1 860, published after his 
death, where he noted that in his primary equation for geometric progression, the parameters were supposed to 
represent constant quantities for a very long term of years I4j. 

From 1825 to 1862 Gompertz was involved on the subject of what was called vital statistics in an effort 
to understand why there were consistent age patterns of death among people. Gompertz assumed that human 
beings have certain powers of integration and that those powers could be divided into a principal or fundamental 
part and an auxiliary part responsible for the maintenance of the principal power of integration. This auxilar 
force is some kind of recuperative force, a power to oppose destruction that the organisms lose in equal 
proportions in equal small intervals of time. Gompertz also beUeved on the presence of powers destroying 
this auxiliary force and multiplied this hypothetical force against life by the population alive to estimate the 
number of deaths in the age interval. Gompertz realized that if the force to destroy life operated equally on 
everyone, then all individuals should have the same length of life, something he knew could not be true. As a 
possible explanation, Gompertz emphasized the importance of chance in the timing with which death occurs. 
At that time the concept of genetic heterogeneity was not known, instead, Gompertz invoked chance to explain 
why members of a presumed homogeneous cohort die at different times Q. 

After Gompertz death, the subject remained mostly unknown in the scientific community. In 1860, W. M. 
Makeham improved Gompertz law of mortality incorporating a term due to chance in the equation. He noted 
that the logarithms of the probabilities of living from Gompertz's formula increased at a faster pace at higher 
age than at younger ages, so he developed a theory of partial forces of mortality that intended to explain this. 
Makeham linked the diseases associated with the diminution of the vital power to specific organ systems -the 
lungs, heart, kidneys, stomach and liver, and brain. These diseases represented a significant portion of total 
mortality at that time and worked well in solving the observed problem of greater increased forces of mortality 
at older ages than at younger ages. His formula accurately portrayed the mortality experience of various human 
populations between ages 10 and 95 |6]|. 

Gompertz and Makeham recognized that the original Gompertz equation did not apply to the entire age range. 



the formula was intended to apply only between the ages of 20 and 60. In fact, Gompertz suggested in his last 
paper that there are four distinct periods in the life span between which separate laws of mortality apply: birth 
to 12 months, 12 months to 20 years, 20 years to 60 years, and 60 years to 100 years. Even within this range 
he recognized that his formula worked best "provided the intervals be not greater than certain limits." The 
applicability of the Gompertz function to only a specified range within the life span have been recognized by 
many researchers but still nowadays some researchers reject the entire Gompertz paradigm after finding that it 
does not apply to older ages for some organisms |5l|. 

Scientists started searching biological explanations for Gompertz's law of mortality until the first years of 
the twentieth century, motivated in part by the fact that increases in mortality among nonhuman species also 
followed Gompertz's law for a lai^ge portion of their life span. Differences among species were assumed to be 
just a matter of scale. 

Brownlee (1919) suggested that mortality due to senescent causes should be expressed first at about age 12, 
become the dominant force of total mortality by age 30, and advance at an exponential rate from ages 12 to 
85. He also recognized that a law of mortality was likely to be obscured by nonsenescent mortality. Brownlee 
identified a formula that accurately describes the rate of decay of substances subject to the action of organic 
ferments (i.e., bacteria exposed to a disinfecting solution) which he believed produced a time dependent decay 
analogous to the loss of vital power. He found that his formula corresponded to Makeham's adjustment of 
Gompertz's equation, leading him to the conclusion that life depends on the energy of certain substances in the 
body, an energy which is gradually being destroyed throughout life Q. 

Wright (1926) ||8| appears to have first suggested the use of the Gompertz curve for biological growth. 
Following Wright, Davidson (1928) used the Gompertz curve to represent the growth in body weight of cattle 
lITOl . Weymouth, McMillin and Rich (1931) used the Gompertz curve to represent the growth in shell size of 
the razor clam ||9||- They stated that the curve also gives good fits for the guinea pig and the rat. It must be 
noted that they have found necessary in their most extensive series, the use of two different curves to graduate 
the first and second halves of their data. Weymouth and Thompson (1931) also applied the Gompertz curve to 
the growth of the Pacific cockle flTll . Since then, a number of authors fitted Gompertz formula to growth data 
for animals and organisms with remarkable success. 

Already in 1934, Casey fitted the Gompertz model to tumour growth data and was followed by numerous 
authors 1 121. The general conclusion has been that the Gompertz law very well describes tumor growth, but a 
biological explanation for this success has not been found. 

The first person who attempted to perform an interspecies comparison of mortality, in this case, the mortality 
schedules of Drosophila and humans, was Raymond Pearl. Pearl (1921) plotted the survival curves of US 
males in 1910 on a scale with those of the longwinged male Drosophila llOI . Although Pearl acknowledged 
the arbitrary nature of this comparison, particularly in the choice of the beginning age interval for both species, 
he demonstrated a remarkable similarity in the curves. In his second study (1922), Pearl refined his interspecies 
approach and found that the form of the distributions was fundamentally the same llT4l . In addition, he found 
that humans had a higher life expectancy at every age relative to the Drosophila, a discovery that he attributed to 
humans' control over their environment. Pearl was the first to manipulate experimentally the living conditions 
of his study populations to test the importance of accidental deaths on the survival curves. He was convinced 
that his research would reveal a "fundamental biological law" of mortality for more than one species, but after 
two decades of research using this scaling approach on an expanded repertoire of species. Pearl and Minor 
lITSl emphatically declared that a universal law of mortality did not exist. Pearl and Minor identified what 
Makeham had identified 68 yeai^s before as the main problem -the inability to partition total mortality into its 
intrinsic and extrinsic causes of death. 

In the 50's, researchers turned to the use of radiation, which they thought was a method to accelerate 
senescence, for understanding aging and making interspecies extrapolation of mortality risks. George Sacher 
(1950), a pioneer in this field, assumed that the effects of radiation combined additively with natural aging, 
without introducing new pathology L16J . Under this assumption, the Sacher model accounted for natural aging 



by the inclusion of a simple linear time dependent term to the integral lethality function for radiation injury. He 
observed that at low daily dose rates, the reciprocal difference in mean survival times for a control and for an 
irradiated population was proportional to the intensity of exposure. In 1952 Austin Brues and George Sacher 
envisioned injury as a process that disrupts the normal physiological oscillations about a mean homeostatic 
state within an organism, and that there were lethal injuries that an organism could not tolerate. Brues and 
Sacher noted that this biological model of injury and failure lead directly to the formulation Gompertz derived 
to describe his law of human mortality lITTl . Using mean survival times, Sacher estimated cumulant lethality 
functions to compare empirically the similarities and differences in species' responses to radiation injury within 
phases of the injury process. Sacher and Trucco, however, noted that they had insufficient knowledge about the 
fluctuation process in real systems and that the very fact of performing an observation introduced a disturbance 
in the study fWi . 

Like Brody before him, Failla (1958) defined vitality as the reciprocal of the age specific mortality rate llT9l . 
After expressing the Gompertz function in terms of vitality, he suggested that the resulting equation described 
the loss of vitality from a one hit random process acting on the cell population. Failla concluded that the 
vitality curve must describe a deterioration in the function of cells with age. He attributed the deterioration of 
function to somatic mutations, and interpreted the Gompertz aging parameter (derived from mortality data) 
as an estimate of the spontaneous somatic gene mutation rate per cell per year. With some assumptions 
about generation length and the number of genes in diploid cells, Failla (1960) calculations suggested that 
the mutation rate per generation was similar- across species l20l . This would imply that the somatic mutation 
rate per unit time is higher in short-Uved animals than in animals with longer life span. 

Szilard (1959) also developed a theory on the nature of the aging process based on the concept of accumulated 
somatic damage iBTj. Inherited mutations in somatic genes whose function is critical late in the life span 
was viewed as the major explanation for the different lengths of human beings' life. Like Sacher's lethal 
bound, Szilai^d envisioned death occurring when the fraction of somatic cells unaffected by mutation reached 
a critical threshold. He suggested that the magnitude of Ufe shortening following exposure to radiation should 
be inversely related to the square root of the number of chromosomes of a species. As such, mice and humans 
should experience a similar radiation-induced life shortening when expressed as a fraction of the life span. 

The quantitative as well as the biological importance of the Gompertz distribution was further enhanced by 
the work of Bernard Strehler and Albert Mildvan (1960) [22|, these investigators presented a Gompertz-based 
theory of mortality and aging that was based on disruptions of the homeostatic state of an organism. Their 
approach differed from that of Sacher in the functional form of the equations used to describe the disturbances 
of the " energetic environment" of an organism challenged by stress. Strehler also made several important 
observations of the biological effects of radiation compared to the effects of aging. He noted that (1) aging 
effects are typically associated with post-mitotic cell whereas radiation primarily affects dividing cells; (2) 
radiation damage is primarily genetic whereas the effects of aging appear to be more broad spectrum; (3) some 
species (e.g., Drosophila) do not exhibit life shortening even after large doses of radiation; and (4) the dose 
required to double the mortality rate (i.e., Gompertz slope) produces a much lai^ger increase in the mutation 
rate. Based on this observations, Strehler rejected the notion that radiation acts through a general acceleration 
of the normal aging process. 

Studies of radiation effects continued to make extensive use of the Gompertz distribution throughout the 
1960's. Like Greenwood (1928) before him [ 23J, Grahn (1970) proposed to use the ratio of Gompertz slopes to 
adjust for life span differences when making mortality comparison between species l24ll . Grahn successfully 
used this scaling approach to predict reductions in human life expectancy following radiation exposure from 
doses response relationships observed in mice. 

It seems that within the field of radiation, extrapolation between species had some success, but this differs from 
Pearl's conclusion that a fundamental law of mortality applying to various species does not exist. The reason 
lies on the environmental conditions of the animals being compared, because Pearl's studies were based on the 
comparison with species that experienced high levels of exogenous mortality, and the laboratory animals used 



in radiation studies came from controlled environments without predation and where infectious diseases were 
minimized. These environmental conditions are far more similar to the sheltered environment and medical 
attention received by humans, leading to a better comparison between species |5l|. 

The modern development of biodemography originated with a series of articles published by Weiss and 
colleagues l25l . Weiss (1990) recognized that the field of genetic epidemiology could provide insights into 
the biological constraints influencing the shape of the mortality function in populations. Weiss's merging of 
the fields of demography and genetics and his subsequent elaboration using principles of evolutionary biology 
served as a launching point for the latest developments in the field of biodemography. 

For most species survival beyond the age of reproduction is an extremely rare event with most deaths for a 
cohort occurring just after birth. At these ages the vast majority of deaths result from forces of mortality that 
are unrelated with senescence (e.g., predation or diseases). In this hostile environments, early reproduction 
has become an essential element in species' reproductive strategies ( 1261 ). Consistent patterns of growth and 
development observed within species suggest that the reproductive biology of organisms aUve today represents 
a genetic legacy of responses to environmental conditions that prevailed during early evolutionary history of 
each species. The modern evolutionary theory of senescence is based on the premise that selection is effective 
in altering gene frequencies until the time before the end of the reproductive period. When the normally high 
force of external mortality is controlled and survival beyond the end of reproductive period becomes a common 
occurrence, senescence and senescent-related diseases and disorders have the opportunity to be expressed. 
Because there are common forces (i.e., extrinsic mortality) responsible for molding species 'reproductive 
strategies, a common pattern of intrinsic mortality, an evolutionary imprint, may become visible when species 
are compared on a biologically comparable time scale. Carnes et al. lITTl have argued that the timing of 
genetically determined processes such as growth and development are driven by a reproductive biology, molded 
by the necessity of early reproduction, which in turn is driven by the normally high external force of mortality. 
If individual senescence is an inadvertent consequence of these developmental processes as predicted from 
the evolutionary theory of senescence, then age patterns of intrinsic mortality in a population should also be 
calibrated to some element(s) of a species's reproductive biology. These ideas have been introduced in various 
computational models. 

Recent mortality schedules reveal a more pure biological influence because the external causes of death 
have been dramatically reduced by medical and technological advances and almost everyone now lives to 
his biological potential. At the same time, a greater understanding of biological processes has also allowed the 
modification of intrinsic mortality (e.g. medicine, treatments and operations) altering the survival trajectories 
of individuals whose intrinsic diseases have already been expressed. From this perspective, the biological 
life span of a specie is one based on a mortality schedule that would prevail in the absence of survival time 
manufactured by medical or pharmaceutical intervention of any kind - a view consistent with that of Raymond 
Pearl. When enough members of a population benefit from these medical interventions, it is possible that the 
life span of the population will exceed its biologically based limits. All past research on mortality suggests 
that Gompertz was right all along: there are biological reasons for why death occurs when it does, and a law 
of mortality for many species may very well exists. Which is the limit imposed by this law of mortality for 
humans, and the degree to which these limits can be manipulated is still subject of great interest O). 

The Gompertz equation was developed exclusively for human beings both as an empirical tool to describe the 
age pattern of death from all causes during a limited time frame, and as representing a law of mortality that 
arises from inherent biological processes. Gompertz never imagined that his equation would become a tool 
used in the analysis not only of failure time of organisms but also of failure time of mechanical devices and in 
the description of biological and tumour growth. 



3.2 l\imour Growth Equations 



A mathematical model of tumour growth is a mathematical expression of the dependence of tumour size 
in time. The common feature is that growth follows a sigmoid curve with three distinct phases: the initial 
exponential phase, the linear phase and the plateau. The most widely used framework is consider tumour 
growth as a dynamical system described by ordinary differential equations, although some growth models are 
formulated successfully also by partial differential equations. 

The simple tumour growth model is described by a single, first order, autonomous differential equation: 



where y{t) > is tumour size at time t and f{y) is a function describing the growth rate. The solution of 
(13.11) has the remarkably property of a monotonic ascending function of time when f{yo) > 0, or a monotonic 
descending function of time when f{yo) < 0. In the case of an ascending function, this implies that the 
stationary (critical) point corresponds to the maximum possible tumor size, y„j, achieved for t —^oo. Similarly, 
in the case of a descending function, the stationary point achieved for f ^ oo is 3;^. > 0. The model given by 
Eq. (I3.1I) describes continuous tumour growth which asymptotically approaches the finite value y,n or infinity 
(that corresponds to the unattainable unrestricted growth). On the other hand, (13. U can describe continuous 
tumour regression from size y = yo to extinction (j = 0) at some finite time or when ? — > 00. However, the 
solution of (13.11) can not describe oscillatory tumour growth with regressions and relapses. The solution y{t) 
represents a sigmoidal ascending curve characteristic of tumour growth if a unique point of inflection exists. 
This condition can be achieved for some simple functions f{y). It is conceivable that functions f{y) exist 
which yield solutions with multiple inflection points resulting in "multisigmoidal" curves. Such curves would 
describe tumor growth with recun^ent stagnation phases ll28l . 

More complex models of tumour growth kinetics are described by systems of ordinary autonomous first-order 
differential equations: 



for / = 1, . . . ,« and with initial conditions ^(0) =yo > 0, Xi{0) = xq. Here x,-, . . . ,x„ are variables describing 
various factors responsible for tumour growth (e.g., levels of available nutrients, growth factor activity, size 
of quiescent cell population, etc.). The functions / and fi and the variables x, are chosen to represent growth 
mechanisms of particular interest. Unlike the simple model given by Eq. ( I3.lt . the system of two differential 
equations {n = 1 in Eq. (I3.2t ) can describe smooth oscillatory tumor growth L281 . 

There is no further advance without specifying model functions f{y) that represent tumor growth mechanisms. 
The first approach is to consider the classical chemical kinetics paradigm, based on mass conservation. For 
tumour growth this paradigm can be expressed in its simplest form by: 



The tumor size (mass) at time t + At is equal to the size at time t enlarged by G{y{t))At (generation of mass) 
during the small time interval At, and diminished by D{y{t))At (degradation of mass) during the same time 
interval. The functions G{y) > and D{y) > are the growth and degradation rates respectively, assumed to 
depend on tumor size only. Within the limit of ? — > 0, i3.3l becomes a differential equation: 



y{t)=ny) 



y{0)=yo>0, 



(3.1) 




(3.2) 



y{t + At) = y{t) + G{y{t))At - D{y{t))At. 



(3.3) 




(3.4) 



• G(yo)>£>(3'o); 



• Only one solution y,„ > of G{y) = D{y) exists as does only one solution j,- > of ^^y^ = '^^^ , and 

• yi < Jm- 

In the latter case, y,„ is the maximal tumor size achieved asymptotically and yi is the tumor size at the inflection 
point. The stated conditions can be met easily if both G{y) and D(y) are monotonic ascending functions. In 
a typical kinetics paradigm, these functions ai^e given by the power function, ky'\ where k is the rate constant 
and n is the order of the process. 

The second approach takes a fundamental idea: tumor growth results from exponential cell proliferation (often 
called "Malthusian growth") described by: 

— = av, a > 0. (3.5) 
at 

This equation describes unrestricted growth leading to infinite tumor size, a notion not supported by 
observation. Initially tumor growth behaves approximately according to (13. 5t . but eventually it becomes 
stagnant due to restrictions within the tumor itself and those imposed by the environment. Thus, exponential 
growth must be modified to include terms that restrict growth. This can be achieved by multiplying y on the 
right-hand side of i3.5l with a function F{y) > satisfying limy^y^F{y) = 0. The con^esponding differential 
equation is: 

^ = ayFiy). (3.6) 

Biologically, the function F{y) can be interpreted as a growth function, i.e. as the ratio of proliferating cells in 
tumour versus total cell population, or more generally, the ratio of growing tumour mass versus total tumour 
mass. The consequence of this interpretation requires that F{y) <l and that parameter a be interpreted as the 
growth rate constant for the hypothetical unrestricted growth. 

The maximal tumor size, y„j, predicted by the model is often designated as carrying capacity, 5 > 0, of the 
environment for tumors in vitro or of the host for tumors in vivo. It is useful to introduce S explicitly into the 
growth fraction model: 

ccygd), y{0)=yo>0. (3.7) 



dt \Sy 

Mathematically, both considered approaches [yielding Eq. i3Al or Eq. (I3.7t 1 are equivalent and one can easily 
transform one equation into the other. However, on the vantage point of modeling and interpretation, the two 
approaches are quite different. The same differential equation can yield an intuitively acceptable interpretation 
in one approach, while it can lack a transparent interpretation in the other. The paradigms of mass conservation 
and growth fraction can obviously be used in development of more elaborated models yielding systems of 
equations Eq. (|l2l lEH. 

3.2.1 Exponential Growth 

If the number of cells in a tumour at time t is denoted by y{t), then, at time t + At, the number of cells would be 
expressed a.sy{t + At). The number of cells added to the tumour in the time interval At can be found subtracting 
y{t + At) —y{t), but this number is proportional to the duration of the time interval (i.e. more cells arrive in a 
long interval than in a short one) so: 

y{t + At)-y{t) = NAt, 

y(t+At)-yit) _ ^ (3-«) 
Ar — ly. 

Suppose that the increase in number of the cell population is due entirely to cells being born. As time progresses 
the division or birth rate may be altered so that more or less divisions occur, so the number of cells born in the 
interval At may vary with time. Moreover, if there are more cells at time t, more divisions are likely to occur 



and A'^ will also depend on y{t). Letting At 0, the left-hand side of Eq. (I3.8I) becomes the derivative of y with 
respect to t, and we have: 



dyjt) 
dt 



^N{t,y{t)}, 



(3.9) 



where we show explicitly the quantities on which N depends. The expression {\/y){dy/dt) is known as the 
specific growth rate. Therefore, another way of describing Eq. (I3.9t is to say that the specific growth rate is 

N{t,y)/y. 

It is plausible to assume that, in a short time interval, there will be about twice as many births as in a time 
interval of half its length. Thus, one could expect that the number of births would be proportional to y{t)At 
when Af is small. If the birth rate does not change in the time interval, A? can be expressed as ay{t)At with a 
a suitable constant. Then Eq. (13.91 ) becomes: 



dy{t) 
dt 



ay{t), 



(3.10) 



which states that the specific growth rate is a, the same for all times and all sizes of tumour. This equation has 
the same form of the expression found in Eq. (13. 5t and its solution can be realized by the following procedure: 



"Jo"' — Jy(0) y 



0"' - /y| 

at = ln{y{t)/y{0)}, 



(3.11) 



leading to: 



y{t)=yoe'", (3.12) 

where yo is any constant that can be fixed by putting f = in Eq. (13.121 1. and evidently is the size of the tumour 
at f = 0. 
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Figure 3.1: Exponential growth, withyo = 1, « = 1. See Eq. (I3.12t 



The behaviour of a tumour, or a population, as time increases according to (I3.12t is displayed in figure (13. U . 
The size grows steadily, and the increase becomes dramatic as time goes on. Of course, in any real situation, 
there will be a limit to the growth because of a shortage of essential supplies or insufficient food. Nevertheless, 



many organisms exhibit exponential growth in their initial stages E^ . 

Notice that Eq. (I3.10t has been derived on the assumption that only births can occur. In the event that there 
are deaths but no births the same equation can be reached. However, a is now a negative number since the 
population or cell number decreases in the time interval A?. It follows from (I3.12t that the population decays 
exponentially with time from its size at f = 0. 

More facets of the population problem can be incorporated in this equation. For instance, we may postulate 
that the number of deaths in the short time interval A? is j8j(?)A?. Similaiiy, individuals may enter the given 
area from outside, say I{t)At immigrants in the interval At. Likewise, some may depart from the area giving 
rise to E{t)At emigrants. We can model this population facets via the following equation: 



y{t + At)-y{t) = ay{t)\t - py{t)At + I{t)At - E{t)At, (3.13) 

leading to: 

^ = {a-l5)yit)+Iit)-Eit), (3.14) 

when At —>■ 0. More generally, / and E could be made to depend on y so that Eq. (I3.14t (often called Verhulst's 
differential equation) can be difficult to solve. Notwithstanding, it is transparent that, if we hope to predict 
the size of a population at a given time, to find the solution of a differential equation will be an essential 
requirement ||29| . 



3.2.2 Logistic Growth 

A characteristic that must be taken into account is that the multiplication in cell numbers is restricted by 
crowding effects. Biochemically, these may be due to lack of nutrients, shortage of oxygen, change in pH or 
the production of inhibitors, for example. Whatever the cause, the cells are interacting between them. Since 
each cell can interact with y others, there are y^ possibilities in total. This suggests that, in Eq. (I3.9t . we should 
put: 



N{t,y{t)} = ayit)-py{t)\ (3.15) 

where a and /3 are positive constants. The term involving a is the same as before and takes into account the 
increase due to division. The term containing f5 represents the inhibition on growth causes by crowding. With 
the substitution of Eq. ( I3.15t toward Eq. ( 13. 9t we have: 

^ = ay-l5y\ (3.16) 



which is called the differential equation of logistics. In the growth fraction paradigm Eq. (I3.7I) . the equation 
equivalent to Eq. (13.161) is: 



dt 

where S = a/j5. 



^ = ayil-y/S), (3.17) 



If we integrate Eq. (I3.16t from to f, we obtain: 

Jo"' ~ Jv 



t 



y{0) ay-lix^ ' 

lr.vW/l_ P 
aJy{0)\y W^J^^y' 

«'"lv(0){/i>.(0-a} 



(3.18) 



Hence, solving for y{t), we have: 



y{t){MO)-a} 

yit) 



mt)-a}y{0)e''' 

«y(0) 

" /5.y(0)+{a-/Jy(0)}e-«' ■ 



(3.19) 



which is known as the logistic law of growth. In terms of the carrying capacity S = a/P, Eq. (I3.19t takes the 
next form: 



syjo) 



(3.20) 



y{0)+{S-y{0)}e-"' " 

The logistic curve is used to model a great variety of physical situations in which growth of a quantity is 
"self-limited", that is, the growth rate of the quantity depends on the size of the quantity in such a way that if 
the quantity grows beyond a certain level, the growth rate decreases. The logistic model nicely describes the 
behaviour of certain types of growth in business, economics, populations and sales forecasts l29l . 
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Figure 3.2: Logistic curve with a = 3, j8 = 1, y{0) = 1. 



The curve of logistic growth is shown in figure ( I3.2I ). assuming that a > jSy(O). The curve rises steadily from 
the value y(0) at ? = to an eventual value of a/j3 , there being neither maxima nor minima in the curve. There 
is, however, a point of inflexion where the curve crosses its tangent att = to where: 

andy(fo) = a/2j8. 

Notice that the final value a/p of y does not involve v(0), so that, no matter what the initial size of the 
population, its final size is always the same and does not depend on the starting size of the population. 

In 1838, Verhulst proposed this model as a description of population growth. The model had been virtually 
forgotten until Pearl "rediscovered" it years later. Since then it has been often used as a point of departure for 
more advance population models. In 1945, Rashevsky, one of the founders of modern mathematical biology, 
arrived to the logistic model by considering tumor growth. Interestingly, the logistic model was used for fitting 
to tumor growth data much less frequently than the Gompertz model. On the other hand, the logistic model has 
been used in kinetics models describing immune response to tumor, where it has served as a mathematically 



simple description of immunologically unaffected tumor growth. Similarly, the logistic model has been used 
in models for chemotherapy optimization l28ll . 

It is important to remember that the logistic law assumes that all cells divide at the same rate, and this is not 
always true. There are types in which some cells divide faster than others. Whether the logistic law can be 
applied still depends upon the differences between the various rates of division present. If the rates are not too 
fai- apart it is probably feasible to take a as their average. For greater deviations may be necessary to adopt a 
model in which the statistics of the number of cells of a given age and type at a given time play a part t29.1 . 

The immediate generalization of the logistic model Eq. (13.161) is: 



dt 



ay-Py", v > 1, 



(3.22) 



with solution 



vU«(i-v); 



a 



(3.23) 



Function (I3.23t is often designated as the Richard function. The solution of Eq. (13.221) has been thoroughly 
discussed by Fletcher. Interestingly, when this model was fitted to tumor growth data with jq, a, jS, and v as 
free parameters, in most cases it was found that v \. Clearly, v cannot be exactly 1, because then (I3.22t 
would describe unrestricted exponential growth. However, if (I3.22t is reparametrized somewhat peculiarly as: 



dy 
dt 



a + - 



y- 



Ty 



ay — by 



v-l 



1 



then in the limit v 



ly" v-i 

1 one obtains the Gompertz model dy/dt 



1 



(3.24) 



ay — byXny, using the general result: 
limv^o ^-7^ = Inc.. The result that fitting to data yielded v 1 can be interpreted as a clear indication that the 
Gompertz model is a much more adequate description of tumor growth kinetics than is the logistic model l28l . 



3.2.3 Von Bertalanffy Growth 

The combination of the chemical kinetics paradigm and the principle of allometry led von Bertalanffy to 
formulate the model of organismic growth represented by the equation: 



At >0, 



V >0. 



(3.25) 



It was shown that for any /i and v the solution of Eq. (I3.25t can not be expressed in terms of elementary 
functions, but in terms of the modified beta-function. 



P{x,r,s) 



1/2 



(1 



uY ^du 



(3.26) 



and its inverse. 



The model characterized by /i = 2/3 and v = 1 is based on the so called "surface rule", which is often named 
von Bertalanffy model. The underlying notion is that the anabolic growth rate is proportional to the surface 
ai"ea (expressed as j^/^ where y is interpreted as volume), and the catabolic growth rate is proportional to the 
volume itself. Another especial case of Eq. (13.251) is the generalized logistic model with /i = 1 , presented by 
Eq. (I3.22t . and its counterpart with v = 1: 



dy 
dt 



ay^-py, pL<\. 



(3.27) 



The solution of this equation is of the same form as Eq. (I3.23t . because Eq. (I3.27t can be formally transformed 
into Eq. 



(I3.22t by parameter redefinition, as clearly presented by Fletcher. Obviously, model Eq. (13.27 
contains the von Bertalanffy "surface rule" model. 



Returning to the general model Eq. (I3.25t . we wish to point out its not obvious relationship to the Gompertz 
model. Similarly to the generalized logistic model, Eq. (I3.25t can be reparametrized into: 

(3.28) 

In the limit £ ^ 0, one then obtains the so called "generalized Gompertz model": 

^=ay>'-byHny, (3.29) 
at 

which for /I = 1 reduces to the original Gompertz model. In practice, this means that tumor growth data 
described by the generalized von Bertalanffy model with /i 1, v w 1, are described also by the Gompertz 
model lEH. 

3.2.4 Gompertz-Makeham Growth 

In the paradigm of chemical kinetics (see Eq. (I3.4t ). the equation 

^ = - iSjlnj, ^(0) = JO > 0, (3.30) 

has the Gompertz growth formula as the unique solution. The growth rate ay reflects the Malthusian law with 
clear interpretation, but the degradation rate lacks any such interpretation. 

In the growth fraction paradigm (Eq. ( 13.71 )). the equation equivalent to Eq. ( I3.30t is obtained for g{z) = — Inz, 
i.e. 

^ = -Tying), y{0)=y^>Q. (3.31) 

Thus the growth fraction g{z) is the simplest possible elementary transcendental function which obeys 
g{z) G [0, 1) for z G (0, 1] withg(l) = 0. Besides the simplicity argument, there is not an obvious interpretation 
of the growth fraction function. The solution of Eq. (I3.31t and Eq. (I3.30t reads: 

y = y^e^a/fi-\x,yo){\-e-P') 

y = y^e^n(S/y,){\-e-y) (3 32) 



Comparison of (I3.30t and (I3.31t yields interesting relations among parameters: 

j3 = y, a = yXnS. (3.33) 

These relations suggest that the inherent growth rate constant 7 (the rate constant for um^estricted growth, i.e., 
5 ^ 00 ) is equal to the degradation rate constant /3 and yet 7 is also proportional to the Malthusian growth rate 
constant a. This indicates that the Gompertzian growth is regulated by the parameter 7 which controls both 
growth and degradation l28ll . 



If we start from Eq. (13.311) and declare the growth fraction a new time dependent variable: 



Kj)=ln(-). (3.34) 



'S' \y 



The solution (I3.32t satisfies also the system of equations: 

I " (3-35) 

with initial conditions j(0) = jo and x{Q) = ln(5'/3'o). From here, it is clear that the parameter 7 is at the same 
time the inherent growth rate constant and the rate constant for the temporal decrease of the growth fraction. 



This certainly is a peculiarity of the Gompertz model which supports the idea that the single pai^ameter a 
controls an inhibitory feedback mechanism operating in tumors. Beyond this and beyond the transparent 
structure of Eq. (I3.35t . that has a simple interpretation, other fundamental insights are not apparent. Another 
possibility to present the Gompertz model as a system of two differential equations is based on the introduction 
of the effective growth rate x\ = yjc as a variable: 

r rfv 

Ir'''^' (3.36) 

This system of equations is interpreted as describing exponential growth with exponential retardation. 
However, this can be inferred directly from Eq. (13.301) . 



3.2.5 Mathematical Properties and Comparison Between Logistic and Gompertz Growth 



It is convenient to write equation Eq. (13.321) as: 



^a—h.x 



y = ce-' , (3.37) 



in which c and b are essentially positive quantities. From Eq. (I3.37t it is clear that as x becomes negatively 
infinity y will approach zero, and as x becomes positively infinity y will approach c. Differentiating Eq. (13.371) 
we have: 

^=cbe"-''^e-'^"-''=bye"-''-\ (3.38) 
ax 

and it is apparent that the slope is always positive for finite values of x, and approaches zero for infinite values 
of X. Differentiating again: 

= b^ye"-''''{e"-'''' - 1), (3.39) 

and we obtain the point of inflection in: 

a c 

x=-- y = -, (3.40) 
b e 

or approximately, when 37 % of the final growth has been reached. Therefore, when we desire to fit growth 
data which show a point of inflection in the early part of the growth cycle, we may use the Gompertz curve 
with the expectation that the approximation to the data will be good. Notice Figure 1, which shows the form 
of the curve for the case c = l,fl: = 0, Z7=l; there ai^e also shown the logistic and the first derivative of the 
Gompertz curve 1301 . 

The logistic possesses the same number of constants as the Gompertz curve, but has the point of inflection 
mid-way between the asymptotes. It is described by the following equation: 

^ = TT^- (3.41) 

It has been found useful to add a constant term to the logistic, giving it a lower asymptote different from zero: 

This procedure is equally applicable to the Gompertz curve giving: 

y = d + ce-'"'''\ (3.43) 



The Gompertz curve and the logistic possess similar properties which make them useful for the empirical rep- 
resentation of growth phenomena. Each curve has three arbitrary constants, which corresponds essentially to 




Figure 3.3: Gompertz curve and its first derivative, and tlie logistic curve, with c = \, a = 0, b = \. 



the upper asymptote, the time origin, and the time unit or rate constant. In each curve, the degree of skewness, 
as measured by the relation of the ordinate at the point of inflection to the distance between the asymptote, is 
fixed Oni. 

To illustrate the mathematical properties of the Gompertz and logistic curves, the table from ll30l has been 
reproduced on the next page. 

The Gompertz equation is used as a predictive tool in demography [31 1. However, the Gompertz law of expo- 
nential increase in mortality rates with ages is observed in many other biological species, such as rats, mice, 
fruit flies and flour beetles 1^ . not only on humans, and, therefore, some general theoretical explanation for 
this phenomenon is required. Furthermore, it often fits growth of organisms, organs and tumours. Despite 
numerous attempts, no consensus has been forged about the biological foundation of the broad applicability of 
the model (331. 
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Table 3.1: Mathematical properties of Gompertz and logistic curves. 
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Figure 3.4: A simple feedback control loop. 



Appendix A: Control Theory Fundamentals 



Control theory deals with the behaviour of dynamical systems over time. In a few words, is the mathematical 
study of how to manipulate the parameters affecting the behaviour of a system to produce the desired or op- 
timal outcome. Control theory plays an important role in the design of manufacturing processes in industry, 
robotics, transportation, and biology, among other apphcations. Some of its basic concepts are the following: 

System: set of elements that act in coordination to perform some objective. 

Plant, P: is the physical element that one desires to control. Some examples are motors, ovens, navigation 
systems, bioreactors, etc. 

Output signal, y{t): is the variable that one wishes to control (position, velocity, pressure, temperature, etc). 
Is also called control variable. 

Reference Signal, r{t): is the desired value for the output signal to reach. 
Error, e{t): the difference between the reference signal and the real output signal. 

Control signal, c{t): is the signal produced by the controller C in order to modify the control variable in such 
a way that the eiTor decreases. 

Process: steps that drive us to certain result. 

Perturbation: a signal affecting the output of the system, deviating it from the desired value. 

Sensor: device that turns the value of certain physical quantity (pressure, temperature, flow, etc.) into an 
electrical signal codified in analogic or digital forms. 

Closed-loop controller: the output of the system y{t) is compared to the reference value r{t), through the 
measurement performed by a sensor. The controller then takes the difference between the reference and the 
output, the error e{t), to change the inputs u{t) to the system under control. Is known as feedback control. 

Open-loop controller: the output signal y{t) is not monitored to generate a control signal c{t). There is no 
direct connection between the output of the system and its input u{t). One of the main disadvantages of this 
type of controller is the lack of sensitivity to the dynamics of the system under control. 

Stability: means that for any bounded input over any amount of time, the output will also be bounded. This is 
known as BIBO stability. If a system is BIBO stable then the output cannot diverge if the input remains finite. 

The most simple closed-loop controller is a so-called single-input-single-output (SISO) control system, and 
is presented in Fig. 13.41 Examples where one or more variables can contain more than a value (MIMO, i.e. 
Multi-Input-Multi-Output - for example when outputs to be controlled are two or more) are frequent. In such 
cases variables are represented through vectors instead of simple scalar values. 

If we assume the controller C and the plant P are linear and time-invariant (i.e.: elements of their transfer 
function C{s) and P{s) do not depend on time), we can analyze the system shown in the Fig. I3.4l bv using the 



Laplace transform on the variables. This gives us the following relations: 



Y{s) =P{s)U{s) 



(3.44) 



U{s) =C{s)E{s) 



(3.45) 



E{s) =R{s)-Y{s) 



(3.46) 



Solving for Y(s) in terms of R(s), we obtain: 




l+P{s)C{s) 



P{s)C{s) 



) 



(3.47) 



The term 



is refen^ed to as the transfer function of the system. If we can ensure P{s)C{s) >> 1, i.e. it 



\+P(s)C{s) 



has very great norm with each value of s, then Y{s) is approximately equal to R{s). This means we control the 
output by simply setting the reference. 

Controllability and observability are main issues in the analysis of system before decide the best control 
strategy to be applied. Controllability is related to the possibility to force the system in a particular state 
by using an appropriate control signal. If a state is not controllable, then no signal will ever be able to force 
the system to reach a level of controllability. Observability instead is related to the possibility to"observe", 
through output measurements, the system occupying a state. If a state is not observable, the controller will 
never be able to correct the closed-loop behaviour if such a state is not desirable. 

Every control system must guarantee first the stability of the closed-loop behaviour. For linear systems, this 
can be obtained directly placing the poles. The behaviour of a non-linear system is not expressible as a linear 
function of its state or input variables, so non-linear control systems used instead specifical theories (normally 
based on Lyapunov Theory) to ensure stability without regard to inner dynamics of the systems. The possibility 
to fulfill different specifications varies from the model considered and/or the control strategy chosen. 
Solutions to problems of uncontrollable or unobservable system include adding actuators and sensors. 

An observer is an auxiliary dynamical system which uses the available measurement on the system in order to 
provide an estimate x of the state of the system. The dynamical nature of an observer means that the estimates 
of the state variable are provided on line. By an adaptive scheme we mean an observer that is able to provide 
an estimate state even in face of parameter uncertainties. 



http://en.wikipedia.org/wLki/Control_theory 
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